Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: task
    • Labels:
      None
    • Release Note:
      Hide
      Added two expert level configuration properties.
      1. "mapreduce.reduce.shuffle.notify.readerror" to know whether to send notification to JobTracker after every read error or not. If the configuration is false, read errors are treated similar to connection errors.
      2. "mapreduce.reduce.shuffle.maxfetchfailures" to specify the maximum number of the fetch failures after which the failure will be notified to JobTracker.
      Show
      Added two expert level configuration properties. 1. "mapreduce.reduce.shuffle.notify.readerror" to know whether to send notification to JobTracker after every read error or not. If the configuration is false, read errors are treated similar to connection errors. 2. "mapreduce.reduce.shuffle.maxfetchfailures" to specify the maximum number of the fetch failures after which the failure will be notified to JobTracker.

      Description

      Since we upgraded to hadoop-0.20.1 from hadoop0.18.3, we see lot of more map task failures because of 'Too many fetch-failures'.

      One of our jobs makes hardly any progress, because of 3000 reduces not able to get map output of 2 trailing maps (with about 80GB output each), which repeatedly are marked as failures because of reduces not being able to get their map output.
      One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed mapoutput fetch even after a single try when it was a read error (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is a good idea, as trailing map tasks will be attacked by all reduces simultaneously.

      Here is a log output of a reduce task:

      2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200910281903_0028_r_000000_0 copy failed: attempt_200910281903_0028_m_002781_1 from some host
      2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException: Read timed out        at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.read(SocketInputStream.java:129)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
              at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
              at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
              at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
              at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
              at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
              at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
              at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
              at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
      
      2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_200910281903_0028_r_000000_0: Failed fetch #1 from attempt_200910281903_0028_m_002781_1
      2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to fetch map-output from attempt_200910281903_0028_m_002781_1 even after MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to the JobTracker.
      

      Also I saw a few log messages which look suspicious as if successfully fetched map output is discarded because of the map being marked as failed (because of too many fetch failures). This would make the situation even worse.

      2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed len: 23967845
      2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 23967845 bytes (21882555 raw bytes) into RAM from attempt_200910281903_0028_m_001076_0
      2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0
      2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_200910281903_0028_m_001076_0 -> (20, 39772) from some host
      ...
      2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0'
      
      1. patch-1171-ydist.txt
        2 kB
        Amareshwari Sriramadasu
      2. patch-1171-2.txt
        3 kB
        Amareshwari Sriramadasu
      3. patch-1171-1-ydist.txt
        2 kB
        Amareshwari Sriramadasu
      4. patch-1171-1.txt
        4 kB
        Amareshwari Sriramadasu
      5. patch-1171.txt
        4 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Christian Kunz added a comment -

          HADOOP-3327 seemed to have introduced the jobtracker notification on a single read-error.

          Show
          Christian Kunz added a comment - HADOOP-3327 seemed to have introduced the jobtracker notification on a single read-error.
          Hide
          Qi Liu added a comment -

          This is caused by a behavioral change in hadoop 0.20.1. In 0.18.3, the fetch-retry behavior used to be that each map output fetch will retry N (by default 6) times. Now in 0.20.1, not each map output fetch will have N retries. For a particular map output, only the first fetch attempt will have N retries. If the first N retry fails, subsequent fetch attempt, even it is from a different node, will only have 2 retries before failure. Thus, it greatly increased the chance of having "too many fetch failures".

          The line of code is in src/mapred/org/apache/hadoop/mapred/ReduceTask.java, line 2090 to line 2104.

          I would argue that if the subsequent map output fetch attempt is from the mapper node, it should only have 2 retries. However, if the map output is from a different mapper node (basically a different map attempt), it should still have N retries.

          Show
          Qi Liu added a comment - This is caused by a behavioral change in hadoop 0.20.1. In 0.18.3, the fetch-retry behavior used to be that each map output fetch will retry N (by default 6) times. Now in 0.20.1, not each map output fetch will have N retries. For a particular map output, only the first fetch attempt will have N retries. If the first N retry fails, subsequent fetch attempt, even it is from a different node, will only have 2 retries before failure. Thus, it greatly increased the chance of having "too many fetch failures". The line of code is in src/mapred/org/apache/hadoop/mapred/ReduceTask.java, line 2090 to line 2104. I would argue that if the subsequent map output fetch attempt is from the mapper node, it should only have 2 retries. However, if the map output is from a different mapper node (basically a different map attempt), it should still have N retries.
          Hide
          Christian Kunz added a comment -

          The algorithm in hadoop-0.20 that determines when to mark a map task as having too many fetch failures seems to have big problems with jobs that have a few out-layers in map task runtimes.

          The job mentioned in the description has a couple of map tasks requiring about 3 hours of runtime (1hr application execution time and 2hrs merge phase), with the overall average map runtime much smaller.
          The consequence is that all reduces just wait for these few map tasks to finish, and these maps get declared as 'failed' very quickly because of many socket timeout exceptions on the reduce side although there is nothing wrong with the map tasks.

          The first attempt to run the job failed after 28hrs (we allow for up to 10 map task failures), with about only 200 reduces successfully fetching map output every time the long running map tasks completed; these map tasks got declared as failed within less than 1 minute after completion.

          We deployed an emergency fetch in the ReduceTask, commenting out the portion where a single read error results in a report:

            
                        if (//cr.getError().equals(CopyOutputErrorType.READ_ERROR) ||
                           ((noFailedFetches >= fetchRetriesPerMap)
                            && ((noFailedFetches - fetchRetriesPerMap) % 2) == 0)) {
                          synchronized (ReduceTask.this) {
                            taskStatus.addFetchFailedMap(mapTaskId);
          

          and it helped somewhat.
          The job finished after 6 re-executions of the long running map tasks (fortunately, we allow a high number of map task failures), but still took 18hrs instead of optimally less than 4 hrs. About 600 reduces got map output successfully before the map tasks got declared as failed, always because of 5+ socket timeout exceptions (one advantage of hadoop-0.20 to retry a fetch within a few seconds was counter-productive here).

          With hadoop-0.18.3 this job needed less than 9 hrs (not optimal but we could live with it).

          Take away:
          Current algorithm is not general enough to cover all corner cases, in particular jobs with a handful of much longer running map tasks.
          It should be improved (there should be an allowance for socket timeout exceptions when a high number of reduces fetch output from a single map task in parallel) or there should be a configuration option to allow to override the threshold on a per job basis.

          Show
          Christian Kunz added a comment - The algorithm in hadoop-0.20 that determines when to mark a map task as having too many fetch failures seems to have big problems with jobs that have a few out-layers in map task runtimes. The job mentioned in the description has a couple of map tasks requiring about 3 hours of runtime (1hr application execution time and 2hrs merge phase), with the overall average map runtime much smaller. The consequence is that all reduces just wait for these few map tasks to finish, and these maps get declared as 'failed' very quickly because of many socket timeout exceptions on the reduce side although there is nothing wrong with the map tasks. The first attempt to run the job failed after 28hrs (we allow for up to 10 map task failures), with about only 200 reduces successfully fetching map output every time the long running map tasks completed; these map tasks got declared as failed within less than 1 minute after completion. We deployed an emergency fetch in the ReduceTask, commenting out the portion where a single read error results in a report: if (//cr.getError().equals(CopyOutputErrorType.READ_ERROR) || ((noFailedFetches >= fetchRetriesPerMap) && ((noFailedFetches - fetchRetriesPerMap) % 2) == 0)) { synchronized (ReduceTask.this) { taskStatus.addFetchFailedMap(mapTaskId); and it helped somewhat. The job finished after 6 re-executions of the long running map tasks (fortunately, we allow a high number of map task failures), but still took 18hrs instead of optimally less than 4 hrs. About 600 reduces got map output successfully before the map tasks got declared as failed, always because of 5+ socket timeout exceptions (one advantage of hadoop-0.20 to retry a fetch within a few seconds was counter-productive here). With hadoop-0.18.3 this job needed less than 9 hrs (not optimal but we could live with it). Take away: Current algorithm is not general enough to cover all corner cases, in particular jobs with a handful of much longer running map tasks. It should be improved (there should be an allowance for socket timeout exceptions when a high number of reduces fetch output from a single map task in parallel) or there should be a configuration option to allow to override the threshold on a per job basis.
          Hide
          Amareshwari Sriramadasu added a comment -

          HADOOP-3327 went into branch 0.21

          Show
          Amareshwari Sriramadasu added a comment - HADOOP-3327 went into branch 0.21
          Hide
          Amareshwari Sriramadasu added a comment -

          Christian, are you using Yahoo! distribution for 0.20?

          In branch 0.21, MAPREDUCE-353 makes connect and read timeout configurable for a job. Moreover, Shuffle is simplified by MAPREDUCE-318. Essentially, HADOOP-3327 is no more there.

          Christian, Making connect and read timeout configurable should address this issue, right?

          Show
          Amareshwari Sriramadasu added a comment - Christian, are you using Yahoo! distribution for 0.20? In branch 0.21, MAPREDUCE-353 makes connect and read timeout configurable for a job. Moreover, Shuffle is simplified by MAPREDUCE-318 . Essentially, HADOOP-3327 is no more there. Christian, Making connect and read timeout configurable should address this issue, right?
          Hide
          Christian Kunz added a comment -

          Just for the record, we use a 020.1 yahoo release.
          I checked that Cloudera releases contain HADOOP-3327 as early as hadoop-0.20.0+61.

          Show
          Christian Kunz added a comment - Just for the record, we use a 020.1 yahoo release. I checked that Cloudera releases contain HADOOP-3327 as early as hadoop-0.20.0+61.
          Hide
          Christian Kunz added a comment -

          Yes, in absence of MAPREDUCE-318, MAPREDUCE-353 should be sufficient to work around the issue.

          Show
          Christian Kunz added a comment - Yes, in absence of MAPREDUCE-318 , MAPREDUCE-353 should be sufficient to work around the issue.
          Hide
          dhruba borthakur added a comment -

          are the fixes for this problem already checked into the Yahoo 0.20 release?

          Show
          dhruba borthakur added a comment - are the fixes for this problem already checked into the Yahoo 0.20 release?
          Hide
          Amareshwari Sriramadasu added a comment -

          No, they are not checked in.

          Show
          Amareshwari Sriramadasu added a comment - No, they are not checked in.
          Hide
          Amareshwari Sriramadasu added a comment -

          In branch 0.21, MAPREDUCE-353 makes connect and read timeout configurable for a job. Moreover, Shuffle is simplified by MAPREDUCE-318. Essentially, HADOOP-3327 is no more there.

          Sorry, just verified the new shuffle code, the code for HADOOP-3327 still exists.

          Show
          Amareshwari Sriramadasu added a comment - In branch 0.21, MAPREDUCE-353 makes connect and read timeout configurable for a job. Moreover, Shuffle is simplified by MAPREDUCE-318 . Essentially, HADOOP-3327 is no more there. Sorry, just verified the new shuffle code, the code for HADOOP-3327 still exists.
          Hide
          Amareshwari Sriramadasu added a comment -

          One solution I could see is
          "Provide a config option whether to send a notification for read error or not, and make number of retries (currently this is hardcoded to 10 on 0.21) on a fetch failure configurable"
          However, for Yahoo! distribution, the config option to send a notification for read error or not, would suffice.

          Show
          Amareshwari Sriramadasu added a comment - One solution I could see is "Provide a config option whether to send a notification for read error or not, and make number of retries (currently this is hardcoded to 10 on 0.21) on a fetch failure configurable" However, for Yahoo! distribution, the config option to send a notification for read error or not, would suffice.
          Hide
          Jothi Padmanabhan added a comment -

          Provide a config option whether to send a notification for read error or not

          Just to clarify this a little more. Currently a notification is sent for every read error. The config option would be to disable this behavior and treat read errors similar to connection errors.

          Show
          Jothi Padmanabhan added a comment - Provide a config option whether to send a notification for read error or not Just to clarify this a little more. Currently a notification is sent for every read error. The config option would be to disable this behavior and treat read errors similar to connection errors.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch adding the suggested config variables.

          Tested the patch manually by simulating read timeout and verified with configuration "mapreduce.reduce.shuffle.notify.readerror" set to true, the fetch failure is notified after "mapreduce.reduce.shuffle.maxfetchfailures". With "mapreduce.reduce.shuffle.notify.readerror" set to false, it is notified immediately after the failure.

          Show
          Amareshwari Sriramadasu added a comment - Patch adding the suggested config variables. Tested the patch manually by simulating read timeout and verified with configuration "mapreduce.reduce.shuffle.notify.readerror" set to true, the fetch failure is notified after "mapreduce.reduce.shuffle.maxfetchfailures". With "mapreduce.reduce.shuffle.notify.readerror" set to false, it is notified immediately after the failure.
          Hide
          Amareshwari Sriramadasu added a comment -

          verified with configuration "mapreduce.reduce.shuffle.notify.readerror" set to true, the fetch failure is notified after "mapreduce.reduce.shuffle.maxfetchfailures". With "mapreduce.reduce.shuffle.notify.readerror" set to false, it is notified immediately after the failure.

          Sorry.. I interchanged true and false in above statement. Verified with configuration "mapreduce.reduce.shuffle.notify.readerror" set to false, the fetch failure is notified after "mapreduce.reduce.shuffle.maxfetchfailures". With "mapreduce.reduce.shuffle.notify.readerror" set to true, it is notified immediately after the failure.

          Show
          Amareshwari Sriramadasu added a comment - verified with configuration "mapreduce.reduce.shuffle.notify.readerror" set to true, the fetch failure is notified after "mapreduce.reduce.shuffle.maxfetchfailures". With "mapreduce.reduce.shuffle.notify.readerror" set to false, it is notified immediately after the failure. Sorry.. I interchanged true and false in above statement. Verified with configuration "mapreduce.reduce.shuffle.notify.readerror" set to false, the fetch failure is notified after "mapreduce.reduce.shuffle.maxfetchfailures". With "mapreduce.reduce.shuffle.notify.readerror" set to true, it is notified immediately after the failure.
          Hide
          Jothi Padmanabhan added a comment -

          Patch looks fine to me, couple of minor nits

          1. Can we rename maxFetchFailuresBeforeReport to maxFetchFailuresBeforeReporting
          2. I think the documentation in mapred-default for mapreduce.reduce.shuffle.notify.readerror can be changed to probably something like Expert. Flag to decide whether JobTracker should be notified on every read error or not. If the flag is false, read errors are treated similar to connection errors.
          Show
          Jothi Padmanabhan added a comment - Patch looks fine to me, couple of minor nits Can we rename maxFetchFailuresBeforeReport to maxFetchFailuresBeforeReporting I think the documentation in mapred-default for mapreduce.reduce.shuffle.notify.readerror can be changed to probably something like Expert. Flag to decide whether JobTracker should be notified on every read error or not. If the flag is false, read errors are treated similar to connection errors .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427455/patch-1171.txt
          against trunk revision 888761.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/308/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/308/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/308/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/308/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427455/patch-1171.txt against trunk revision 888761. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/308/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/308/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/308/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/308/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch with changes suggested by Jothi

          Show
          Amareshwari Sriramadasu added a comment - Patch with changes suggested by Jothi
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch for Yahoo! distribution.

          test-patch result:

               [exec] -1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
          

          Manually verified the configuration is honored by simulating read time out for a map on a tasktracker.

          Show
          Amareshwari Sriramadasu added a comment - Patch for Yahoo! distribution. test-patch result: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. Manually verified the configuration is honored by simulating read time out for a map on a tasktracker.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427567/patch-1171-ydist.txt
          against trunk revision 889085.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/181/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427567/patch-1171-ydist.txt against trunk revision 889085. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/181/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          Removed the documentation from mapred-default.xml, suggusted by Arun.

          Show
          Amareshwari Sriramadasu added a comment - Removed the documentation from mapred-default.xml, suggusted by Arun.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch for trunk.

          Show
          Amareshwari Sriramadasu added a comment - Patch for trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427572/patch-1171-2.txt
          against trunk revision 889085.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427572/patch-1171-2.txt against trunk revision 889085. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          The test failure, TestTrackerBlacklistAcrossJobs, is unrelated to the patch. The log has following ZipException :

          2009-12-10 07:51:17,307 WARN  mapred.TaskTracker (TaskTracker.java:startNewTask(1887)) - Error initializing attempt_20091210075025802_0001_m_000001_0:
          java.lang.RuntimeException: java.util.zip.ZipException: ZIP_Read: error reading zip file
          	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1600)
          	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1408)
          	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1352)
          	at org.apache.hadoop.conf.Configuration.get(Configuration.java:574)
          	at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1874)
          	at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:392)
          	at org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:925)
          	at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:869)
          	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1883)
          	at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:109)
          	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1848)
          

          The same test passes on my machine.

          Show
          Amareshwari Sriramadasu added a comment - The test failure, TestTrackerBlacklistAcrossJobs, is unrelated to the patch. The log has following ZipException : 2009-12-10 07:51:17,307 WARN mapred.TaskTracker (TaskTracker.java:startNewTask(1887)) - Error initializing attempt_20091210075025802_0001_m_000001_0: java.lang.RuntimeException: java.util.zip.ZipException: ZIP_Read: error reading zip file at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1600) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1408) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1352) at org.apache.hadoop.conf.Configuration.get(Configuration.java:574) at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1874) at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:392) at org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:925) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:869) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1883) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:109) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1848) The same test passes on my machine.
          Hide
          Amareshwari Sriramadasu added a comment -

          All unit test except TestHdfsProxy passed on machine, with ydist patch.

          Show
          Amareshwari Sriramadasu added a comment - All unit test except TestHdfsProxy passed on machine, with ydist patch.
          Hide
          dhruba borthakur added a comment -

          hi amareshwari 7 jothi: can you pl advice is you plan to check this patch into the 0.20 release? or to the yahoodist 0.20 release?

          Show
          dhruba borthakur added a comment - hi amareshwari 7 jothi: can you pl advice is you plan to check this patch into the 0.20 release? or to the yahoodist 0.20 release?
          Hide
          Arun C Murthy added a comment -

          Dhruba - currently the plan is to put this into 21 and y20.

          Show
          Arun C Murthy added a comment - Dhruba - currently the plan is to put this into 21 and y20.
          Hide
          Arun C Murthy added a comment -

          I've just committed this. Thanks, Amareshwari!

          Show
          Arun C Murthy added a comment - I've just committed this. Thanks, Amareshwari!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #171 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/171/)
          . Allow shuffle retries and read-error reporting to be configurable. Contributed by Amareshwari Sriramadasu.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #171 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/171/ ) . Allow shuffle retries and read-error reporting to be configurable. Contributed by Amareshwari Sriramadasu.

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Christian Kunz
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development