Hadoop Common
  1. Hadoop Common
  2. HADOOP-3327

Shuffling fetchers waited too long between map output fetch re-tries

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    1. hadoop-3327.patch
      15 kB
      Jothi Padmanabhan
    2. hadoop-3327-v1.patch
      14 kB
      Jothi Padmanabhan
    3. hadoop-3327-v2.patch
      14 kB
      Jothi Padmanabhan
    4. hadoop-3327-v3.patch
      15 kB
      Jothi Padmanabhan
    5. patch-3327.txt
      11 kB
      Amareshwari Sriramadasu
    6. patch-3327-1.txt
      11 kB
      Amareshwari Sriramadasu
    7. patch-3327-2.txt
      10 kB
      Amareshwari Sriramadasu

      Issue Links

        Activity

        Hide
        Robert Chansler added a comment -

        Editorial pass over all release notes prior to publication of 0.21. bug

        Show
        Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21. bug
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #756 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/756/ )
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Amareshwari!
        I should note that this particular patch just handles read timeouts better. There is good scope of future work here and follow-up issues should be raised (for e.g., how best to determine when to kill a (faulty)map/reduce task during shuffle).

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Amareshwari! I should note that this particular patch just handles read timeouts better. There is good scope of future work here and follow-up issues should be raised (for e.g., how best to determine when to kill a (faulty)map/reduce task during shuffle).
        Hide
        Amareshwari Sriramadasu added a comment -

        contrib-test failure TestAgentConfig.testInitAdaptors_vs_Checkpoint is not related to the patch

        Show
        Amareshwari Sriramadasu added a comment - contrib-test failure TestAgentConfig.testInitAdaptors_vs_Checkpoint is not related to the patch
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12399449/patch-3327-2.txt
        against trunk revision 740532.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12399449/patch-3327-2.txt against trunk revision 740532. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        attaching patch with the change sugggested by Devaraj

        Show
        Amareshwari Sriramadasu added a comment - attaching patch with the change sugggested by Devaraj
        Hide
        Devaraj Das added a comment -

        Sorry, I just realized that we can avoid the class for handling read timeout exceptions, and instead have a thread-local variable that's set whenever a read timeout is seen...

        Show
        Devaraj Das added a comment - Sorry, I just realized that we can avoid the class for handling read timeout exceptions, and instead have a thread-local variable that's set whenever a read timeout is seen...
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12399348/patch-3327-1.txt
        against trunk revision 740237.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12399348/patch-3327-1.txt against trunk revision 740237. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        test-patch and ant tests passed on my machine.

        Show
        Amareshwari Sriramadasu added a comment - test-patch and ant tests passed on my machine.
        Hide
        Amareshwari Sriramadasu added a comment -

        Reduces the waitiing time between map output fetch re-tries during the end of the shuffle by notifying the JobTracker, aggressively.
        ReadTimeOuts during shuffle are treated differently: Reducer notifies the JobTracker immediately for a readTimeout and backs off for more time.

        Show
        Amareshwari Sriramadasu added a comment - Reduces the waitiing time between map output fetch re-tries during the end of the shuffle by notifying the JobTracker, aggressively. ReadTimeOuts during shuffle are treated differently: Reducer notifies the JobTracker immediately for a readTimeout and backs off for more time.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch with review comments incorporated.

        Show
        Amareshwari Sriramadasu added a comment - Patch with review comments incorporated.
        Hide
        Devaraj Das added a comment -

        Also, the change in JobInProgress is not required at this point IMO.

        Show
        Devaraj Das added a comment - Also, the change in JobInProgress is not required at this point IMO.
        Hide
        Devaraj Das added a comment -

        Looks fine to me. We shouldn't have any new configuration overall.

        Show
        Devaraj Das added a comment - Looks fine to me. We shouldn't have any new configuration overall.
        Hide
        Jothi Padmanabhan added a comment -

        Looks good. A few points

        • Some comments on the changes in the code would be good.
        • The percentages that we use to decide maxNotifications and fetchRetriesPerMap should be configurable?
        • Since fetchRetriesPerMap is computed during every iteration as per the current copiedMapOutputs.size, it is possible that we might delay a notification to the JT by one failure. For example, consider maxFetchRetriesPerMap = 5 and numRetries=4. During the next failure numRetries = 5, and let us say we cross the threshold and reset fetchRetriesperMap = 2 (5/2). As per the existing logic, we would have sent a notification as numRetires = maxFetchRetriesPerMap. But with the new logic, we will wait as 5%2 != 0. But this is a corner case and probably can be overlooked.
        Show
        Jothi Padmanabhan added a comment - Looks good. A few points Some comments on the changes in the code would be good. The percentages that we use to decide maxNotifications and fetchRetriesPerMap should be configurable? Since fetchRetriesPerMap is computed during every iteration as per the current copiedMapOutputs.size, it is possible that we might delay a notification to the JT by one failure. For example, consider maxFetchRetriesPerMap = 5 and numRetries=4. During the next failure numRetries = 5, and let us say we cross the threshold and reset fetchRetriesperMap = 2 (5/2). As per the existing logic, we would have sent a notification as numRetires = maxFetchRetriesPerMap. But with the new logic, we will wait as 5%2 != 0. But this is a corner case and probably can be overlooked.
        Hide
        Amareshwari Sriramadasu added a comment -

        test-patch result :

             [exec]
             [exec] -1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
             [exec]                         Please justify why no tests are needed for this patch.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
             [exec]
             [exec]
        

        It is not easy to write a testcase for this.

        All core and contrib unit tests passed on my machine.

        Show
        Amareshwari Sriramadasu added a comment - test-patch result : [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] It is not easy to write a testcase for this. All core and contrib unit tests passed on my machine.
        Hide
        Amareshwari Sriramadasu added a comment -

        Some more numbers:

        Job With trunk With patch
        Sort on 200 nodes 1hrs,1mins,13secs 1hrs, 2mins, 8sec
        Sort on 200 nodes with read timeouts simulated for 10 maps 5 during the start of shuffle and 5 during end of shuffle (maxMapRunTime= 3mins, 43sec) 2hrs, 5mins, 7sec 1hrs, 5mins, 37sec ( this is almost same as normal run!)
        SortValidator on 200 nodes with read timeouts for 5 maps during the start of shuffle(maxMapRunTime = 18mins, 58sec) 2hrs 13mins 46sec 31mins, 51sec
        Gridmix on 200 nodes 5329 sec 5187 sec
        Gridmix on 400 nodes 3028 sec 2903 sec

        These results show good improvement incase of fetch failures. And there is no performance degrading.

        Show
        Amareshwari Sriramadasu added a comment - Some more numbers: Job With trunk With patch Sort on 200 nodes 1hrs,1mins,13secs 1hrs, 2mins, 8sec Sort on 200 nodes with read timeouts simulated for 10 maps 5 during the start of shuffle and 5 during end of shuffle (maxMapRunTime= 3mins, 43sec) 2hrs, 5mins, 7sec 1hrs, 5mins, 37sec ( this is almost same as normal run!) SortValidator on 200 nodes with read timeouts for 5 maps during the start of shuffle(maxMapRunTime = 18mins, 58sec) 2hrs 13mins 46sec 31mins, 51sec Gridmix on 200 nodes 5329 sec 5187 sec Gridmix on 400 nodes 3028 sec 2903 sec These results show good improvement incase of fetch failures. And there is no performance degrading.
        Hide
        Amareshwari Sriramadasu added a comment -

        Attaching a patch for review, while i continue my testing.

        I have done simple tests with patch. Results are as follows:
        With read timeouts simulated for 4 map outputs :

        • On Single node cluster with 2 maps, 1 reducer and maxMapRuntime 6sec.
          • Job took 41 mins 4sec without the patch
          • Job took 22mins 2 sec with the patch
        • 20 node cluster with 50 maps, 6 reducers and maxMapRuntime 6sec
          • Job took 18 mins 23sec without the patch
          • Job took 6mins 32sec with the patch.
        • 20 node cluster with 50 maps, 6 reducers with maxMapRuntime 2 mins 33 sec
          • Job took 30mins 0sec without the patch
          • Job took 8mins 30sec with the patch
        Show
        Amareshwari Sriramadasu added a comment - Attaching a patch for review, while i continue my testing. I have done simple tests with patch. Results are as follows: With read timeouts simulated for 4 map outputs : On Single node cluster with 2 maps, 1 reducer and maxMapRuntime 6sec. Job took 41 mins 4sec without the patch Job took 22mins 2 sec with the patch 20 node cluster with 50 maps, 6 reducers and maxMapRuntime 6sec Job took 18 mins 23sec without the patch Job took 6mins 32sec with the patch. 20 node cluster with 50 maps, 6 reducers with maxMapRuntime 2 mins 33 sec Job took 30mins 0sec without the patch Job took 8mins 30sec with the patch
        Hide
        Amareshwari Sriramadasu added a comment -

        After discussion with Jothi and Devaraj, I propose the following approach :

        1. If pendingCopies < 0.25 * numMaps, // towards the end of shuffle
        fetchRetries = maxFetchRetriesPerMap/2;
        // this will send first notification to JT in half the time of the existing algorithm.
        // Also exponential back-off is half the number of times.

        2. If failure is because of ReadTimeOut,
        send notification to JT immediately.
        towards the end of shuffle, back off for min (maxMapRunTime/2, current backoff);
        else back off for maxMapRunTime/2.

        3. At JT,
        if freeMapSlots < 0.5 * totalMapSlots, re-execute the map after 3 notifications. (current algorithm)
        else re-execute the map after 2 notifications.

        Thoughts?

        Show
        Amareshwari Sriramadasu added a comment - After discussion with Jothi and Devaraj, I propose the following approach : 1. If pendingCopies < 0.25 * numMaps, // towards the end of shuffle fetchRetries = maxFetchRetriesPerMap/2; // this will send first notification to JT in half the time of the existing algorithm. // Also exponential back-off is half the number of times. 2. If failure is because of ReadTimeOut, send notification to JT immediately. towards the end of shuffle, back off for min (maxMapRunTime/2, current backoff); else back off for maxMapRunTime/2. 3. At JT, if freeMapSlots < 0.5 * totalMapSlots, re-execute the map after 3 notifications. (current algorithm) else re-execute the map after 2 notifications. Thoughts?
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #618 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/618/ )
        Hide
        Devaraj Das added a comment -

        I reverted the patch on both trunk and 0.19 branch.

        Show
        Devaraj Das added a comment - I reverted the patch on both trunk and 0.19 branch.
        Hide
        Jothi Padmanabhan added a comment -

        We observed that this patch, while reducing time for re execution of maps on failures, is impacting performance negatively for normal runs on regular clusters. Should we revert this patch till we come up with the correct solution?

        Show
        Jothi Padmanabhan added a comment - We observed that this patch, while reducing time for re execution of maps on failures, is impacting performance negatively for normal runs on regular clusters. Should we revert this patch till we come up with the correct solution?
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Jothi!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Jothi!
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12386719/hadoop-3327-v3.patch
        against trunk revision 679202.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12386719/hadoop-3327-v3.patch against trunk revision 679202. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/console This message is automatically generated.
        Hide
        Jothi Padmanabhan added a comment -

        Attaching patch after incorporating the review comments

        Show
        Jothi Padmanabhan added a comment - Attaching patch after incorporating the review comments
        Hide
        Devaraj Das added a comment -

        Sorry for the long turn-around on this one. There are two things that should be addressed:
        1) Convert the error types to enum
        2) There is a copy-paste error in an if-else clause (e.getClass() == ConnTimeoutException.class). The check should be for ReadTimeoutException in the else clause.

        Show
        Devaraj Das added a comment - Sorry for the long turn-around on this one. There are two things that should be addressed: 1) Convert the error types to enum 2) There is a copy-paste error in an if-else clause (e.getClass() == ConnTimeoutException.class). The check should be for ReadTimeoutException in the else clause.
        Hide
        Jothi Padmanabhan added a comment -

        Did manual testing by hacking the code to simulate connection/read timeouts.

        Show
        Jothi Padmanabhan added a comment - Did manual testing by hacking the code to simulate connection/read timeouts.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12386262/hadoop-3327-v2.patch
        against trunk revision 677470.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12386262/hadoop-3327-v2.patch against trunk revision 677470. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/console This message is automatically generated.
        Hide
        Jothi Padmanabhan added a comment -

        Attaching patch again for the latest trunk. Hopefully, third time lucky!!

        Show
        Jothi Padmanabhan added a comment - Attaching patch again for the latest trunk. Hopefully, third time lucky!!
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12385858/hadoop-3327-v1.patch
        against trunk revision 676069.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2852/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385858/hadoop-3327-v1.patch against trunk revision 676069. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2852/console This message is automatically generated.
        Hide
        Jothi Padmanabhan added a comment -

        Patch for the latest trunk

        Show
        Jothi Padmanabhan added a comment - Patch for the latest trunk
        Hide
        Devaraj Das added a comment -

        Sorry the assignment was updated by mistake

        Show
        Devaraj Das added a comment - Sorry the assignment was updated by mistake
        Hide
        Devaraj Das added a comment -

        Sorry this patch no longer applies cleanly. Pls regenerate the patch.

        Show
        Devaraj Das added a comment - Sorry this patch no longer applies cleanly. Pls regenerate the patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12384737/hadoop-3327.patch
        against trunk revision 671563.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384737/hadoop-3327.patch against trunk revision 671563. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/console This message is automatically generated.
        Hide
        Jothi Padmanabhan added a comment -

        Attaching patch for review

        Show
        Jothi Padmanabhan added a comment - Attaching patch for review
        Hide
        Jothi Padmanabhan added a comment -

        There are two possible optimizations to help mitigate the problem.

        Optimization 1 (At Job tracker)
        ============

        The Job tracker could decide on when to re execute a map based on the system
        load. System load would be characterized by the total number of map slots
        available across the whole cluster and the number of unfinished map tasks in
        the queue.

        For example,
        Load = (Total Map Slots available - Total Unfinished Maps) / Total Map Slots

        One possible strategy (Possible default vaues for x = 50%, y = 75%)
        1. If (Load < x), re-execute on first fetch failure notification itself.
        2. If (x < Load < y), re-execute on second fetch failure notification.
        3. Always re-execute (irrespective of the system load) on third notification.

        Optimization 2 (At reduce task)
        ===========

        The strategy is to categorize the time outs (while fetching map outputs) as
        either connection Timeout or Read Timeout and then handle each case
        differently. Currently, there is no distinction and all timeouts are handled
        the same way.

        Handling Connection Timeouts
        --------------------------------------------

        1. Try connecting with the default timeout of 30s.
        2. Follow the existing algorithm of Exponential backoff for retries. This
        algorithm is provided below for quick reference.

        Handling Read Timeouts
        -----------------------------------
        1. Read with a time out = MAX(3 minutes, map_run_time)
        2. Back off for a value = (map_run_time/2)
        3. Send notifications after every read time out.

        Exponential Back Off Algorithm
        ########################
        BACKOFF_INIT = 4000

        maxFetchRetriesPerMap =
        getClosestPowerOf2(map_run_time * 1000 / BACKOFF_INIT) + 1;

        currentBackOff = (noFailedFetches <= maxFetchRetriesPerMap)
        ? BACKOFF_INIT

        • (1 << (noFailedFetches - 1))
          : (this.maxBackoff * 1000 / 2);

        First notification after maxFetchRetriesPerMap attempts
        Second notification after 2 more attempts
        Third notification after another 2 attempts

        Example scenarios for Optimization 2
        Assumptions
        Map run time = 5 mins
        Only one reducer per node (All fetch failures will be from this task alone)

        Case 1. Connect fails.

        Existing algorithm:
        1. First notification = At end of 6 (maxFetchRetriesPerMap) retries. The Exponential backoff algorithm
        comes into play here.
        Approx time = 3mins * 7 + (4+8+16+32+64+128) = 21 + 4.2 = 25.2 mins
        ^^^^^^^^^^^^^^^^^^
        EBO
        2. Second notification = After 2 attempts. Approx time = 25 + (3+2+3) = 33 mins.
        (Back Off = 2 mins after maxFetchRetriesPerMap.)
        2. Third notification = After another 2 attempts. Approx time = 41 mins

        New algorithm:
        1. First Notification = 4.2 mins + 30*7 = 7.5 mins
        ^^^^^^
        EBO
        2. Second Notification = 7.5 + (0.5+2.5+0.5) = 10.5 mins
        3. Third Notification = 10.5 + (0.5+2.5+0.5) = 13.5 mins

        Case 2. Connect successful, read fails.

        Existing algorithm:
        Same as Case 1, 41 mins for map re-execution.

        New algorithm:
        1. First Notification = 5 mins (read time out = map_run_time)
        2. Second Notification = 5+2.5+5 = 12.5 mins (back off = map_run_time/2 = 2.5 mins)
        3. Third Notification = 12.5 + 2.5 + 5 = 20 mins

        Show
        Jothi Padmanabhan added a comment - There are two possible optimizations to help mitigate the problem. Optimization 1 (At Job tracker) ============ The Job tracker could decide on when to re execute a map based on the system load. System load would be characterized by the total number of map slots available across the whole cluster and the number of unfinished map tasks in the queue. For example, Load = (Total Map Slots available - Total Unfinished Maps) / Total Map Slots One possible strategy (Possible default vaues for x = 50%, y = 75%) 1. If (Load < x), re-execute on first fetch failure notification itself. 2. If (x < Load < y), re-execute on second fetch failure notification. 3. Always re-execute (irrespective of the system load) on third notification. Optimization 2 (At reduce task) =========== The strategy is to categorize the time outs (while fetching map outputs) as either connection Timeout or Read Timeout and then handle each case differently. Currently, there is no distinction and all timeouts are handled the same way. Handling Connection Timeouts -------------------------------------------- 1. Try connecting with the default timeout of 30s. 2. Follow the existing algorithm of Exponential backoff for retries. This algorithm is provided below for quick reference. Handling Read Timeouts ----------------------------------- 1. Read with a time out = MAX(3 minutes, map_run_time) 2. Back off for a value = (map_run_time/2) 3. Send notifications after every read time out. Exponential Back Off Algorithm ######################## BACKOFF_INIT = 4000 maxFetchRetriesPerMap = getClosestPowerOf2(map_run_time * 1000 / BACKOFF_INIT) + 1; currentBackOff = (noFailedFetches <= maxFetchRetriesPerMap) ? BACKOFF_INIT (1 << (noFailedFetches - 1)) : (this.maxBackoff * 1000 / 2); First notification after maxFetchRetriesPerMap attempts Second notification after 2 more attempts Third notification after another 2 attempts Example scenarios for Optimization 2 Assumptions Map run time = 5 mins Only one reducer per node (All fetch failures will be from this task alone) Case 1. Connect fails. Existing algorithm: 1. First notification = At end of 6 (maxFetchRetriesPerMap) retries. The Exponential backoff algorithm comes into play here. Approx time = 3mins * 7 + (4+8+16+32+64+128) = 21 + 4.2 = 25.2 mins ^^^^^^^^^^^^^^^^^^ EBO 2. Second notification = After 2 attempts. Approx time = 25 + (3+2+3) = 33 mins. (Back Off = 2 mins after maxFetchRetriesPerMap.) 2. Third notification = After another 2 attempts. Approx time = 41 mins New algorithm: 1. First Notification = 4.2 mins + 30*7 = 7.5 mins ^^^^^^ EBO 2. Second Notification = 7.5 + (0.5+2.5+0.5) = 10.5 mins 3. Third Notification = 10.5 + (0.5+2.5+0.5) = 13.5 mins Case 2. Connect successful, read fails. Existing algorithm: Same as Case 1, 41 mins for map re-execution. New algorithm: 1. First Notification = 5 mins (read time out = map_run_time) 2. Second Notification = 5+2.5+5 = 12.5 mins (back off = map_run_time/2 = 2.5 mins) 3. Third Notification = 12.5 + 2.5 + 5 = 20 mins
        Hide
        Devaraj Das added a comment -

        Maybe till we have fetched 90% of the map outputs, we should do exponential backoff and after that we switch to fixed time smaller backoffs. But in the case of multiple jobs running in the cluster this policy might not be ideal (since the same tasktrackers might be serving outputs from multiple jobs).

        Show
        Devaraj Das added a comment - Maybe till we have fetched 90% of the map outputs, we should do exponential backoff and after that we switch to fixed time smaller backoffs. But in the case of multiple jobs running in the cluster this policy might not be ideal (since the same tasktrackers might be serving outputs from multiple jobs).
        Hide
        Runping Qi added a comment -

        It is pretty clear to me what is broken.
        It is the re-try strategy that does not take the job/task progression state into account.
        A simple heuristic as I outlined earlier will make a bug difference.

        Show
        Runping Qi added a comment - It is pretty clear to me what is broken. It is the re-try strategy that does not take the job/task progression state into account. A simple heuristic as I outlined earlier will make a bug difference.
        Hide
        Amar Kamat added a comment -

        How do you know when the fetch of a map output was scheduled first?

        Look for the first occurrence of 'Copying task_200804301615_0003_m_000756_0 output from' in the reducer logs.

        Why do you even need to confirm the time for first notification?
        It is obvious that the re-try/backoff strategy is flawed.

        Agreed. But the main cause of the problem needs to detected and fixed. Its just the we should be sure that what we are fixing is really broken.

        Show
        Amar Kamat added a comment - How do you know when the fetch of a map output was scheduled first? Look for the first occurrence of ' Copying task_200804301615_0003_m_000756_0 output from ' in the reducer logs. Why do you even need to confirm the time for first notification? It is obvious that the re-try/backoff strategy is flawed. Agreed. But the main cause of the problem needs to detected and fixed. Its just the we should be sure that what we are fixing is really broken.
        Hide
        Runping Qi added a comment -

        How do you know when the fetch of a map output was scheduled first?

        Why do you even need to confirm the time for first notification?
        It is obvious that the re-try/backoff strategy is flawed.
        Instead of following the schedule described above, the reducer should consider
        how many outstanding map outputs it still needs.
        If not many map outputs need to be fetched, the reducer should not back off that long.
        Also, the job tracker should decide whether to re-execute a map based on how many fetch failure AND how busy the system is.
        If there are very few running mappers, then it should re-execute maps more aggressively.

        Show
        Runping Qi added a comment - How do you know when the fetch of a map output was scheduled first? Why do you even need to confirm the time for first notification? It is obvious that the re-try/backoff strategy is flawed. Instead of following the schedule described above, the reducer should consider how many outstanding map outputs it still needs. If not many map outputs need to be fetched, the reducer should not back off that long. Also, the job tracker should decide whether to re-execute a map based on how many fetch failure AND how busy the system is. If there are very few running mappers, then it should re-execute maps more aggressively.
        Hide
        Amar Kamat added a comment -

        The above comment is our hypothesis. It matches incase of 2 nd and 3 rd attempts. Can you confirm the 1 st attempt. Plz check in the reducer logs to see the time required at the reducer to notify the first failure i.e time when the failure was reported - time when the fetch was first scheduled.

        Show
        Amar Kamat added a comment - The above comment is our hypothesis. It matches incase of 2 nd and 3 rd attempts. Can you confirm the 1 st attempt. Plz check in the reducer logs to see the time required at the reducer to notify the first failure i.e time when the failure was reported - time when the fetch was first scheduled .
        Hide
        Runping Qi added a comment -

        Whoo, if a map output cannot be fetched for some reason, it will take at least 45minutes before the job tracker decides to re-execute the mapper?
        That seems a really long time!

        Show
        Runping Qi added a comment - Whoo, if a map output cannot be fetched for some reason, it will take at least 45minutes before the job tracker decides to re-execute the mapper? That seems a really long time!
        Hide
        Amar Kamat added a comment -

        As Runping mentioned that the map takes roughly 7mins and looking at the logs

        2008-04-30 17:32:49,981 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200804301615_0003_m_000756_0
        2008-04-30 17:45:38,438 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200804301615_0003_m_000756_0
        2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200804301615_0003_m_000756_0

        Consider the following
        1) The read timeout for the shuffler is 3min
        2) The total time for sending one fetch-failure-notification would be ~7min (determined by the map runtime)
        3) For the first time the reducer will back of exponentially.

        attempt # backoff timeout total-time
        0 0 3 mins 3 min
        1 4 sec 3 mins 4 sec + 6 min
        2 8 sec 3 mins 12 sec + 9 min
        3 16 3mins 28 sec + 12 min
        4 32 3mins 60 sec + 15 min
        5 64 3mins 124 sec + 18 min
        6 128 3mins 252sec + 21min
        7 256 3mins 508sec + 24min

        i.e in total the reducer waits for 32.46 mins before sending the first failure notification.
        4) After (3) the fetch will be attempted twice, each with 7/2 min backoff before sending the fetch-failure-notification.

        attempt backoff timeout total-time
        1 3.5 mins 3 mins 6.5 mins
        2 3.5 mins 3 mins 13 mins

        i.e the total of 13mins between the 2 nd and 3 rd failure notifications.


        The problem is that in this case the read timeout becomes significant as compared to the total-backoff and the map runtime.

        Show
        Amar Kamat added a comment - As Runping mentioned that the map takes roughly 7mins and looking at the logs 2008-04-30 17:32:49,981 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200804301615_0003_m_000756_0 2008-04-30 17:45:38,438 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200804301615_0003_m_000756_0 2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200804301615_0003_m_000756_0 Consider the following 1) The read timeout for the shuffler is 3min 2) The total time for sending one fetch-failure-notification would be ~7min (determined by the map runtime) 3) For the first time the reducer will back of exponentially. attempt # backoff timeout total-time 0 0 3 mins 3 min 1 4 sec 3 mins 4 sec + 6 min 2 8 sec 3 mins 12 sec + 9 min 3 16 3mins 28 sec + 12 min 4 32 3mins 60 sec + 15 min 5 64 3mins 124 sec + 18 min 6 128 3mins 252sec + 21min 7 256 3mins 508sec + 24min i.e in total the reducer waits for 32.46 mins before sending the first failure notification. 4) After (3) the fetch will be attempted twice, each with 7/2 min backoff before sending the fetch-failure-notification. attempt backoff timeout total-time 1 3.5 mins 3 mins 6.5 mins 2 3.5 mins 3 mins 13 mins i.e the total of 13mins between the 2 nd and 3 rd failure notifications. The problem is that in this case the read timeout becomes significant as compared to the total-backoff and the map runtime.
        Hide
        Raghu Angadi added a comment - - edited

        >2008-04-30 17:25:45,005 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(task_200804301615_0003_m_000756_0,653) failed :
        java.net.SocketException: Connection timed out
        > -
        > at org.mortbay.http.HttpOutputStream.write(HttpOutputStream.java:423)
        > at org.mortbay.jetty.servlet.ServletOut.write(ServletOut.java:54)

        "Connection timed out" error while writing indicates the root cause is mostly the same packet retransmission problem seen in HADOOP-3132.

        Show
        Raghu Angadi added a comment - - edited >2008-04-30 17:25:45,005 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(task_200804301615_0003_m_000756_0,653) failed : java.net.SocketException: Connection timed out > - > at org.mortbay.http.HttpOutputStream.write(HttpOutputStream.java:423) > at org.mortbay.jetty.servlet.ServletOut.write(ServletOut.java:54) "Connection timed out" error while writing indicates the root cause is mostly the same packet retransmission problem seen in HADOOP-3132 .
        Hide
        Runping Qi added a comment - - edited

        Here are the related lines from the job tracker log:

        2008-04-30 17:00:01,346 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200804301615_0003_m_000756_0' to tip tip_200804301615_0003_m_00075
        6, for tracker 'tracker_xxxx'
        2008-04-30 17:07:04,827 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200804301615_0003_m_000756_0' has completed tip_200804301615_0003_m_00
        0756 successfully.
        2008-04-30 17:32:49,981 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200804301615_0003_m_000756_0
        2008-04-30 17:45:38,438 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200804301615_0003_m_000756_0
        2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200804301615_0003_m_000756_0
        2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200804301615_0003_m_000756_0 ...
        killing it
        2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200804301615_0003_m_000756_0: Too many fetch-failures
        2008-04-30 17:56:43,952 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200804301615_0003_m_000756_1' to tip tip_200804301615_0003_m_00075
        6, for tracker 'tracker_xxxx
        2008-04-30 17:56:45,377 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200804301615_0003_m_000756_0' from 'tracker_xxxx
        2008-04-30 18:02:17,893 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200804301615_0003_m_000756_1' has completed tip_200804301615_0003_m_00
        0756 successfully.
        2008-04-30 18:03:16,193 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200804301615_0003_m_000756_0' from 'tracker_xxxx
        2008-04-30 18:03:16,471 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200804301615_0003_m_000756_1' from 'tracker_xxxx

        The above lines show that there ere about 24 minutes time between the first notification of failuring to fetch the map output and the third notice.
        That means the reducer waited for about 12 minutes between each re-tries!
        The re-execution of the map took only about 7 minutes!
        During that time interval between fetch failure notifications,
        there were very few tasks active.

        Show
        Runping Qi added a comment - - edited Here are the related lines from the job tracker log: 2008-04-30 17:00:01,346 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200804301615_0003_m_000756_0' to tip tip_200804301615_0003_m_00075 6, for tracker 'tracker_xxxx' 2008-04-30 17:07:04,827 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200804301615_0003_m_000756_0' has completed tip_200804301615_0003_m_00 0756 successfully. 2008-04-30 17:32:49,981 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200804301615_0003_m_000756_0 2008-04-30 17:45:38,438 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200804301615_0003_m_000756_0 2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200804301615_0003_m_000756_0 2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200804301615_0003_m_000756_0 ... killing it 2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200804301615_0003_m_000756_0: Too many fetch-failures 2008-04-30 17:56:43,952 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200804301615_0003_m_000756_1' to tip tip_200804301615_0003_m_00075 6, for tracker 'tracker_xxxx 2008-04-30 17:56:45,377 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200804301615_0003_m_000756_0' from 'tracker_xxxx 2008-04-30 18:02:17,893 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200804301615_0003_m_000756_1' has completed tip_200804301615_0003_m_00 0756 successfully. 2008-04-30 18:03:16,193 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200804301615_0003_m_000756_0' from 'tracker_xxxx 2008-04-30 18:03:16,471 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200804301615_0003_m_000756_1' from 'tracker_xxxx The above lines show that there ere about 24 minutes time between the first notification of failuring to fetch the map output and the third notice. That means the reducer waited for about 12 minutes between each re-tries! The re-execution of the map took only about 7 minutes! During that time interval between fetch failure notifications, there were very few tasks active.
        Hide
        Runping Qi added a comment -

        A reducer seems to have trouble to fetch a map output segment:

        A lot of exceptions like below in the reducer's log:

        2008-04-30 17:27:32,155 WARN org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:221)
        at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:662)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2364)
        at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2359)
        at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:205)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:828)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:777)

        In the hadoop.log of the trask tracker hosting the map output, I saw a lot exception like:

        2008-04-30 17:25:45,005 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(task_200804301615_0003_m_000756_0,653) failed :
        java.net.SocketException: Connection timed out

        at org.mortbay.http.HttpOutputStream.write(HttpOutputStream.java:423)
        at org.mortbay.jetty.servlet.ServletOut.write(ServletOut.java:54)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2353)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
        at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
        at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
        at org.mortbay.http.HttpServer.service(HttpServer.java:954)
        at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
        at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
        at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
        at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
        at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
        at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

        2008-04-30 17:25:45,005 WARN /: /mapOutput?job=job_200804301615_0003&map=task_200804301615_0003_m_000756_0&reduce=653:
        java.lang.IllegalStateException: Committed
        at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
        at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2376)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
        at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
        at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
        at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
        at org.mortbay.http.HttpServer.service(HttpServer.java:954)
        at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
        at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
        at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
        at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
        at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
        at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

        Show
        Runping Qi added a comment - A reducer seems to have trouble to fetch a map output segment: A lot of exceptions like below in the reducer's log: 2008-04-30 17:27:32,155 WARN org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:221) at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:662) at java.io.FilterInputStream.read(FilterInputStream.java:116) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2364) at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2359) at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:205) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:828) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:777) In the hadoop.log of the trask tracker hosting the map output, I saw a lot exception like: 2008-04-30 17:25:45,005 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(task_200804301615_0003_m_000756_0,653) failed : java.net.SocketException: Connection timed out – at org.mortbay.http.HttpOutputStream.write(HttpOutputStream.java:423) at org.mortbay.jetty.servlet.ServletOut.write(ServletOut.java:54) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2353) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) 2008-04-30 17:25:45,005 WARN /: /mapOutput?job=job_200804301615_0003&map=task_200804301615_0003_m_000756_0&reduce=653: java.lang.IllegalStateException: Committed at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212) at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2376) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

          People

          • Assignee:
            Amareshwari Sriramadasu
            Reporter:
            Runping Qi
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development