Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1143

runningMapTasks counter is not properly decremented in case of failed Tasks.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.20.1
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Corrects the behaviour of tasks counters in case of failed tasks.Incorrect counter values can lead to bad scheduling decisions .This jira rectifies the problem by making sure decrement properly happens incase of failed tasks.
    1. MAPRED-1143-ydist-9.patch
      4 kB
      rahul k singh
    2. MAPRED-1143-ydist-8.patch.txt
      4 kB
      rahul k singh
    3. MAPRED-1143-ydist-7.patch
      4 kB
      rahul k singh
    4. MAPRED-1143-ydist-6.patch
      3 kB
      rahul k singh
    5. MAPRED-1143-ydist-5.patch
      3 kB
      rahul k singh
    6. MAPRED-1143-ydist-4.patch
      3 kB
      rahul k singh
    7. MAPRED-1143-ydist-3.patch
      3 kB
      rahul k singh
    8. MAPRED-1143-ydist-2.patch
      2 kB
      rahul k singh
    9. MAPRED-1143-ydist-1.patch
      2 kB
      rahul k singh
    10. MAPRED-1143-v21.patch
      8 kB
      rahul k singh
    11. MAPRED-1143-7.patch
      8 kB
      rahul k singh
    12. MAPRED-1143-6.patch
      8 kB
      rahul k singh
    13. MAPRED-1143-5.patch.txt
      7 kB
      rahul k singh
    14. MAPRED-1143-4.patch
      7 kB
      rahul k singh
    15. MAPRED-1143-3.patch
      7 kB
      rahul k singh
    16. MAPRED-1143-2.patch
      8 kB
      rahul k singh
    17. MAPRED-1143-2.patch
      8 kB
      rahul k singh
    18. MAPRED-1143-1.patch
      7 kB
      rahul k singh

      Activity

      Hide
      rahul k singh added a comment -

      We increment runningMapTasks counter every time , we give a task speculative or non-speculative . This counter is not decremented properly in case there is a failed task while of the speculative attempts is alive. This is because

      if(!tip.isRunning()) {
       decrement running counter;
      }
      

      tip.isRunning() would keep on returning true if any of the attempts in the tip are still running.

      This would mean that for each task , we would only decrement once , whereas we are incrementing the counter for every attempt of the task.

      Show
      rahul k singh added a comment - We increment runningMapTasks counter every time , we give a task speculative or non-speculative . This counter is not decremented properly in case there is a failed task while of the speculative attempts is alive. This is because if (!tip.isRunning()) { decrement running counter; } tip.isRunning() would keep on returning true if any of the attempts in the tip are still running. This would mean that for each task , we would only decrement once , whereas we are incrementing the counter for every attempt of the task.
      Hide
      rahul k singh added a comment -

      Attaching the patch with the test case.

      Show
      rahul k singh added a comment - Attaching the patch with the test case.
      Hide
      rahul k singh added a comment -

      yahoo dist

      Show
      rahul k singh added a comment - yahoo dist
      Hide
      rahul k singh added a comment -

      yahoo dist patch --no-prefix

      Show
      rahul k singh added a comment - yahoo dist patch --no-prefix
      Hide
      rahul k singh added a comment -

      Attaching the new patch ,

      There was small correction in terms of usage of wasRunning.
      In the new patch we decrement counters when wasRunning is true.

      Show
      rahul k singh added a comment - Attaching the new patch , There was small correction in terms of usage of wasRunning. In the new patch we decrement counters when wasRunning is true.
      Hide
      Amareshwari Sriramadasu added a comment -

      second if block should still check for wasRunning && !isRunning

      Show
      Amareshwari Sriramadasu added a comment - second if block should still check for wasRunning && !isRunning
      Hide
      Amareshwari Sriramadasu added a comment -

      Sorry, second if check looks fine.
      But meterTaskAttempt is calculating statistics. I think It should be outside the !isRunning check.

      Show
      Amareshwari Sriramadasu added a comment - Sorry, second if check looks fine. But meterTaskAttempt is calculating statistics. I think It should be outside the !isRunning check.
      Hide
      rahul k singh added a comment -

      after discussing the meterTaskAttempt issue internally it was decided to leave it as is .

      Show
      rahul k singh added a comment - after discussing the meterTaskAttempt issue internally it was decided to leave it as is .
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12423438/MAPRED-1143-2.patch
      against trunk revision 830531.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      -1 patch. The patch command could not apply the patch.

      Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/99/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423438/MAPRED-1143-2.patch against trunk revision 830531. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/99/console This message is automatically generated.
      Hide
      Jothi Padmanabhan added a comment -

      Vert minor nits with the test case, since you need to regenerate the patch anyway:

      1. I do not think job.finishTask(taskAttemptID[6]) is required as that attempt is already failed
      2. Can we rename oldRunningReduce to oldRunningReduces
      3. Do we need to to clock.advance(20000) before clock.advanceBySpeculativeLag(), just checking..
      Show
      Jothi Padmanabhan added a comment - Vert minor nits with the test case, since you need to regenerate the patch anyway: I do not think job.finishTask(taskAttemptID [6] ) is required as that attempt is already failed Can we rename oldRunningReduce to oldRunningReduces Do we need to to clock.advance(20000) before clock.advanceBySpeculativeLag() , just checking..
      Hide
      rahul k singh added a comment -

      Changed the patch to handle changes to work with ydist branch

      Show
      rahul k singh added a comment - Changed the patch to handle changes to work with ydist branch
      Hide
      rahul k singh added a comment -

      Changed to patch to apply on trunk and added comments by jothi

      Show
      rahul k singh added a comment - Changed to patch to apply on trunk and added comments by jothi
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12423564/MAPRED-1143-3.patch
      against trunk revision 830837.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/103/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/103/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/103/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/103/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423564/MAPRED-1143-3.patch against trunk revision 830837. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/103/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/103/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/103/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/103/console This message is automatically generated.
      Hide
      rahul k singh added a comment -

      Attached the new patch moved metering along with other counters.

      Show
      rahul k singh added a comment - Attached the new patch moved metering along with other counters.
      Hide
      rahul k singh added a comment -

      a small nit in the earlier ydist patch , rectified in this

      Show
      rahul k singh added a comment - a small nit in the earlier ydist patch , rectified in this
      Hide
      rahul k singh added a comment -

      added metering directly under wasRunning

      Show
      rahul k singh added a comment - added metering directly under wasRunning
      Hide
      Amareshwari Sriramadasu added a comment -

      changes look fine to me.

      Show
      Amareshwari Sriramadasu added a comment - changes look fine to me.
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12423665/MAPRED-1143-4.patch
      against trunk revision 831037.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/117/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/117/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/117/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/117/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423665/MAPRED-1143-4.patch against trunk revision 831037. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/117/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/117/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/117/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/117/console This message is automatically generated.
      Hide
      Hemanth Yamijala added a comment -

      Not really about this patch, but the logic for task status updates is way too complicated to reason correctly. For instance, even after this patch, I see that the number of running tasks is decremented under different checks when a task completes and when a task fails. I assume this is for good reason, but still it is difficult to review.

      A radically different idea, which might be completely wrong (smile):

      Can we assume this invariant:

      • A task attempt is added to the list of active tasks in TaskInProgress whenever it is scheduled. Running task counts should be incremented in this code path.
      • Likewise, a decrement of running tasks should happen whenever the task attempt being reported as completed or failed is removed from the active tasks list.

      At least to me, this appears simpler to understand, and fixing the issue reported. Does this make sense ? Anything wrong in the logic ?

      Show
      Hemanth Yamijala added a comment - Not really about this patch, but the logic for task status updates is way too complicated to reason correctly. For instance, even after this patch, I see that the number of running tasks is decremented under different checks when a task completes and when a task fails. I assume this is for good reason, but still it is difficult to review. A radically different idea, which might be completely wrong ( smile ): Can we assume this invariant: A task attempt is added to the list of active tasks in TaskInProgress whenever it is scheduled. Running task counts should be incremented in this code path. Likewise, a decrement of running tasks should happen whenever the task attempt being reported as completed or failed is removed from the active tasks list. At least to me, this appears simpler to understand, and fixing the issue reported. Does this make sense ? Anything wrong in the logic ?
      Hide
      Hemanth Yamijala added a comment -

      I spoke to Amarsri and Rahul about my comments and found out some explanations:

      For instance, even after this patch, I see that the number of running tasks is decremented under different checks when a task completes and when a task fails. I assume this is for good reason, but still it is difficult to review.

      So, the different checks are as follows:

      completedTask() {
        if (this tip is complete) {
          return;
        }
        update counters
      }
      
      failedTask() {
        if (any attempt was running for this tip before status update) {
          update counters
        }
      }
      

      It appears completedTask doesn't need the check for TIP being complete at all, as it can never happen. A tip is marked complete only if atleast one attempt has completed and remains so. If another attempt comes in reporting success now, we fail this in status update and do not follow the completedTask code path at all. So, for all practical purposes, counters are being updated unconditionally in completedTask. Further, in the same code path, the task is removed from the active tasks as well. Hence no further check is necessary.

      The check in failedTask is required though. This is because a task can fail after it has been marked as succeeded. For e.g. if there are fetch failures for a map, or if a tracker is lost. In this case, we should not update counters again because they would have already been updated when the task succeeded.

      However, in this context, I am a little worried that we are checking for any attempt being running before status update, rather than this specific attempt. At least in theory it is possible this results in some inconsistency.

      Consider this sequence of events:

      • A task is scheduled
      • It is speculated
      • It completes -> Counters are decremented here.
      • It fails (lost TT, fetch failures) -> The current patch will decrement counters here again.
      • The speculated attempt succeeds.

      In practice though, this scenario may not be very likely. Apparently fetch failures and lost TTs are the only extreme cases when this is possible. And there is considerable time lag that can happen before a task completes and it has to be failed. The time lag will in most cases be large enough to kill the speculative attempt as well.

      With this background, is it worth changing the current patch to:

      failedTask() {
        if (this task was running before status update) {
          update counters
        }
      }
      

      This seems more correct to me, but was wondering if it was worth the change. Thoughts ?

      Show
      Hemanth Yamijala added a comment - I spoke to Amarsri and Rahul about my comments and found out some explanations: For instance, even after this patch, I see that the number of running tasks is decremented under different checks when a task completes and when a task fails. I assume this is for good reason, but still it is difficult to review. So, the different checks are as follows: completedTask() { if ( this tip is complete) { return ; } update counters } failedTask() { if (any attempt was running for this tip before status update) { update counters } } It appears completedTask doesn't need the check for TIP being complete at all, as it can never happen. A tip is marked complete only if atleast one attempt has completed and remains so. If another attempt comes in reporting success now, we fail this in status update and do not follow the completedTask code path at all. So, for all practical purposes, counters are being updated unconditionally in completedTask. Further, in the same code path, the task is removed from the active tasks as well. Hence no further check is necessary. The check in failedTask is required though. This is because a task can fail after it has been marked as succeeded. For e.g. if there are fetch failures for a map, or if a tracker is lost. In this case, we should not update counters again because they would have already been updated when the task succeeded. However, in this context, I am a little worried that we are checking for any attempt being running before status update, rather than this specific attempt. At least in theory it is possible this results in some inconsistency. Consider this sequence of events: A task is scheduled It is speculated It completes -> Counters are decremented here. It fails (lost TT, fetch failures) -> The current patch will decrement counters here again. The speculated attempt succeeds. In practice though, this scenario may not be very likely. Apparently fetch failures and lost TTs are the only extreme cases when this is possible. And there is considerable time lag that can happen before a task completes and it has to be failed. The time lag will in most cases be large enough to kill the speculative attempt as well. With this background, is it worth changing the current patch to: failedTask() { if ( this task was running before status update) { update counters } } This seems more correct to me, but was wondering if it was worth the change. Thoughts ?
      Hide
      Hemanth Yamijala added a comment -

      After a discussion with Arun, I felt I might clarify little more on what I am proposing. Some details:

      • In TaskInProgress.java, introduce:
          boolean isRunning(TaskAttemptID taskId) {
            return activeTasks.containsKey(taskId);
          }
        
      • Modify JobInProgress.failedTask private API to have an additional parameter wasAttemptRunning, which would be initialized in JIP.updateTaskStatus to tip.isRunning(status.getTaskID())
      • Use wasAttemptRunning only to update the running* counters

      I originally thought we can modify wasRunning to indicate if the attempt was running (rather than if the TIP was running). But after speaking with Arun, I feel we want to localize the changes to as much as possible.

      Show
      Hemanth Yamijala added a comment - After a discussion with Arun, I felt I might clarify little more on what I am proposing. Some details: In TaskInProgress.java, introduce: boolean isRunning(TaskAttemptID taskId) { return activeTasks.containsKey(taskId); } Modify JobInProgress.failedTask private API to have an additional parameter wasAttemptRunning, which would be initialized in JIP.updateTaskStatus to tip.isRunning(status.getTaskID()) Use wasAttemptRunning only to update the running* counters I originally thought we can modify wasRunning to indicate if the attempt was running (rather than if the TIP was running). But after speaking with Arun, I feel we want to localize the changes to as much as possible.
      Hide
      Arun C Murthy added a comment -

      Had an offline discussion with Owen and Devaraj about this mess...

      Owen suggested a better, albeit more involved, fix:

      Update the counters in TIP.updateStatus since it's the single entry point for updating TaskStatus for a task-attempt and since JIP is already locked we reach into into and update runningMapTasks, runningReduceTasks, failedMapTasks, failedReduceTasks, speculativeMapTasks and speculativeReduceTasks.

      Thoughts?

      Show
      Arun C Murthy added a comment - Had an offline discussion with Owen and Devaraj about this mess... Owen suggested a better, albeit more involved, fix: Update the counters in TIP.updateStatus since it's the single entry point for updating TaskStatus for a task-attempt and since JIP is already locked we reach into into and update runningMapTasks, runningReduceTasks, failedMapTasks, failedReduceTasks, speculativeMapTasks and speculativeReduceTasks. Thoughts?
      Hide
      Hemanth Yamijala added a comment -

      If I understand correctly, you are suggesting we expose the counters in JIP for updates from TIP, right ? I feel conceptually, TIP does not own the counters, JIP does. Hence, I feel updates to these must be done by JIP and not TIP. Wouldn't you agree with this ?

      Show
      Hemanth Yamijala added a comment - If I understand correctly, you are suggesting we expose the counters in JIP for updates from TIP, right ? I feel conceptually, TIP does not own the counters, JIP does. Hence, I feel updates to these must be done by JIP and not TIP. Wouldn't you agree with this ?
      Hide
      Devaraj Das added a comment -

      Yes the suggestion is to increment the counters in the TIP. This is not the ideal solution but it seems like doing the counter updates there via accessing the same from the corresponding JIP would make the updates very precise...

      Show
      Devaraj Das added a comment - Yes the suggestion is to increment the counters in the TIP. This is not the ideal solution but it seems like doing the counter updates there via accessing the same from the corresponding JIP would make the updates very precise...
      Hide
      Hemanth Yamijala added a comment -

      Arun helped us understand the change that was discussed and I really like the idea. Rather than looking at this as a way of the TIP calling into the JIP to update counters, I started thinking about this as the TIP deciding what has changed in its state and informing about the change to the JIP. Then JIP decides to update its state (basically the counters) based on this information. This is a really good separation of ownership.

      However, Arun, Rahul and I looked into making this change into the current code base and fell out of confidence with respect to the level of change it was causing. Given this point, we now feel that my original suggestion here, while agreeably not as good as Owen's, is a safer and more localized change to implement than the alternate proposal. And we suggest taking that approach to fix this JIRA.

      A rewrite of this status update code is really long overdue, and it would anyway be done in the manner Owen is proposing, but in a separate JIRA. Hope this is agreeable to all.

      Show
      Hemanth Yamijala added a comment - Arun helped us understand the change that was discussed and I really like the idea. Rather than looking at this as a way of the TIP calling into the JIP to update counters, I started thinking about this as the TIP deciding what has changed in its state and informing about the change to the JIP. Then JIP decides to update its state (basically the counters) based on this information. This is a really good separation of ownership. However, Arun, Rahul and I looked into making this change into the current code base and fell out of confidence with respect to the level of change it was causing. Given this point, we now feel that my original suggestion here , while agreeably not as good as Owen's, is a safer and more localized change to implement than the alternate proposal. And we suggest taking that approach to fix this JIRA. A rewrite of this status update code is really long overdue, and it would anyway be done in the manner Owen is proposing, but in a separate JIRA. Hope this is agreeable to all.
      Hide
      Arun C Murthy added a comment -

      Clearly +1 to Hemanth's suggestion from my end since I'm part of the cabal... smile

      Show
      Arun C Murthy added a comment - Clearly +1 to Hemanth's suggestion from my end since I'm part of the cabal... smile
      Hide
      rahul k singh added a comment -

      Incorporate the hemanth's suggestions

      Show
      rahul k singh added a comment - Incorporate the hemanth's suggestions
      Hide
      rahul k singh added a comment -

      checking with hudson

      Show
      rahul k singh added a comment - checking with hudson
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12427709/MAPRED-1143-5.patch.txt
      against trunk revision 889571.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/320/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/320/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/320/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/320/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427709/MAPRED-1143-5.patch.txt against trunk revision 889571. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/320/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/320/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/320/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/320/console This message is automatically generated.
      Hide
      Arun C Murthy added a comment -

      The trunk and ydist patch differ in one significant manner:

      +    if(wasAttemptRunning) {
      +      if(!tip.isJobCleanupTask() && !tip.isJobSetupTask()) {
      +        if(tip.isMapTask()) {
      +          runningMapTasks -= 1;
      +        } else {
      +          runningReduceTasks -= 1;
      +        }
      +      }
      +    }
      
           boolean isComplete = tip.isComplete();
      +    if (wasAttemptRunning) {
      +      if (!tip.isJobCleanupTask() && !tip.isJobSetupTask()) {
      +        if (tip.isMapTask()) {
      +          runningMapTasks -= 1;
      +          metrics.failedMap(taskid);
      +        } else {
      +          runningReduceTasks -= 1;
      +          metrics.failedReduce(taskid);
      +        }
      +      }
      +    }
      

      Minor nits:

      1. Spacing: call to JIP.failedTask needs a space for the new parameter
      2. The above block of code could use a comment
      3. I'd appreciate if the following change was minimized by adding the new parameter to a new line by itself without changing the others:
      -  private void failedTask(TaskInProgress tip, TaskAttemptID taskid, 
      -                          TaskStatus status, 
      -                          TaskTracker taskTracker,
      -                          boolean wasRunning, boolean wasComplete) {
      +  private void failedTask(
      +    TaskInProgress tip, TaskAttemptID taskid, TaskStatus status,
      +    TaskTracker taskTracker, boolean wasRunning, boolean wasComplete,
      +    boolean wasAttemptRunning) {
      
      Show
      Arun C Murthy added a comment - The trunk and ydist patch differ in one significant manner: + if(wasAttemptRunning) { + if(!tip.isJobCleanupTask() && !tip.isJobSetupTask()) { + if(tip.isMapTask()) { + runningMapTasks -= 1; + } else { + runningReduceTasks -= 1; + } + } + } boolean isComplete = tip.isComplete(); + if (wasAttemptRunning) { + if (!tip.isJobCleanupTask() && !tip.isJobSetupTask()) { + if (tip.isMapTask()) { + runningMapTasks -= 1; + metrics.failedMap(taskid); + } else { + runningReduceTasks -= 1; + metrics.failedReduce(taskid); + } + } + } Minor nits: Spacing: call to JIP.failedTask needs a space for the new parameter The above block of code could use a comment I'd appreciate if the following change was minimized by adding the new parameter to a new line by itself without changing the others: - private void failedTask(TaskInProgress tip, TaskAttemptID taskid, - TaskStatus status, - TaskTracker taskTracker, - boolean wasRunning, boolean wasComplete) { + private void failedTask( + TaskInProgress tip, TaskAttemptID taskid, TaskStatus status, + TaskTracker taskTracker, boolean wasRunning, boolean wasComplete, + boolean wasAttemptRunning) {
      Hide
      rahul k singh added a comment -

      metrics related counters were removed from trunk as part of 1152 , hence they are not in the fix for trunk , For yahoo distribution metric is still in there in failedTask.

      implemented suggestions by arun.

      Show
      rahul k singh added a comment - metrics related counters were removed from trunk as part of 1152 , hence they are not in the fix for trunk , For yahoo distribution metric is still in there in failedTask. implemented suggestions by arun.
      Hide
      rahul k singh added a comment -

      checking with hudson

      Show
      rahul k singh added a comment - checking with hudson
      Hide
      Amareshwari Sriramadasu added a comment -

      I have the same comment as at this. Shouldn't meterTaskAttempt() be in wasAttemptRunning check ?

      Show
      Amareshwari Sriramadasu added a comment - I have the same comment as at this . Shouldn't meterTaskAttempt() be in wasAttemptRunning check ?
      Hide
      rahul k singh added a comment -

      As hemanth mentioned above here , for this jira , the changes are confined to runinng count in the trunk.

      Show
      rahul k singh added a comment - As hemanth mentioned above here , for this jira , the changes are confined to runinng count in the trunk.
      Hide
      Amareshwari Sriramadasu added a comment -

      meterTaskAttempt() is gathering statistics for an attempt that just finished, I think it should also be in wasAttemptRunning check.

      Show
      Amareshwari Sriramadasu added a comment - meterTaskAttempt() is gathering statistics for an attempt that just finished, I think it should also be in wasAttemptRunning check.
      Hide
      Arun C Murthy added a comment -

      Rahul, Amareshwari is right... it's reasonable to move the call to meterTaskAttempt too.

      Show
      Arun C Murthy added a comment - Rahul, Amareshwari is right... it's reasonable to move the call to meterTaskAttempt too.
      Hide
      rahul k singh added a comment -

      incorporated amareshwari's comment

      Show
      rahul k singh added a comment - incorporated amareshwari's comment
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12427890/MAPRED-1143-6.patch
      against trunk revision 889786.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed core unit tests.

      -1 contrib tests. The patch failed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/190/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/190/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/190/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/190/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427890/MAPRED-1143-6.patch against trunk revision 889786. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/190/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/190/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/190/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/190/console This message is automatically generated.
      Hide
      Arun C Murthy added a comment -

      +1 for the patch.

      Show
      Arun C Murthy added a comment - +1 for the patch.
      Hide
      rahul k singh added a comment -

      most of the test cases are failing in the build.
      with
      org.apache.hadoop.ipc.RPC.waitForProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/ipc/VersionedProtocol;
      java.lang.NoSuchMethodError: org.apache.hadoop.ipc.RPC.waitForProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/ipc/VersionedProtocol;

      This is happening on the trunk too , i.e , without applying the patch . This is in no way related to the current fix.

      Show
      rahul k singh added a comment - most of the test cases are failing in the build. with org.apache.hadoop.ipc.RPC.waitForProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/ipc/VersionedProtocol; java.lang.NoSuchMethodError: org.apache.hadoop.ipc.RPC.waitForProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/ipc/VersionedProtocol; This is happening on the trunk too , i.e , without applying the patch . This is in no way related to the current fix.
      Hide
      rahul k singh added a comment -

      opened jira 1292 for the above issue

      Show
      rahul k singh added a comment - opened jira 1292 for the above issue
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12427899/MAPRED-1143-7.patch
      against trunk revision 889786.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed core unit tests.

      -1 contrib tests. The patch failed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/325/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/325/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/325/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/325/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427899/MAPRED-1143-7.patch against trunk revision 889786. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/325/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/325/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/325/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/325/console This message is automatically generated.
      Hide
      rahul k singh added a comment -

      tested the fix for yhadoop , manually, it works fine.

      Show
      rahul k singh added a comment - tested the fix for yhadoop , manually, it works fine.
      Hide
      rahul k singh added a comment -

      [exec] -1 overall.
      [exec]
      [exec] +1 @author. The patch does not contain any @author tags.
      [exec]
      [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
      [exec] Please justify why no tests are needed for this patch.
      [exec]
      [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
      [exec]
      [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
      [exec]
      [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
      [exec]
      [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.
      [exec]
      [exec]
      [exec]
      [exec]
      [exec] ======================================================================
      [exec] ======================================================================
      [exec] Finished build.
      [exec] ======================================================================
      [exec] ======================================================================
      [exec]
      [exec]

      This fix doesnt have testcase , as it requires a significant change , we have provided for the trunk , manual testing is done to make sure that things are fine. for eclipse classpath its a known issue for 20 internal patch.

      Show
      rahul k singh added a comment - [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories. [exec] [exec] [exec] [exec] [exec] ====================================================================== [exec] ====================================================================== [exec] Finished build. [exec] ====================================================================== [exec] ====================================================================== [exec] [exec] This fix doesnt have testcase , as it requires a significant change , we have provided for the trunk , manual testing is done to make sure that things are fine. for eclipse classpath its a known issue for 20 internal patch.
      Hide
      rahul k singh added a comment -

      ran ant test for yhadoop version , the test TestJobHistory failed , i re ran it locally it passed.

      Show
      rahul k singh added a comment - ran ant test for yhadoop version , the test TestJobHistory failed , i re ran it locally it passed.
      Hide
      rahul k singh added a comment -

      ran ant test-contrib.

      TestStreamingExitStatus was failing , but it is failing without the patch too.
      rest all tests passed .

      Show
      rahul k singh added a comment - ran ant test-contrib. TestStreamingExitStatus was failing , but it is failing without the patch too. rest all tests passed .
      Hide
      rahul k singh added a comment -

      the above comment is w.r.t to yahoo hadoop 20 version

      Show
      rahul k singh added a comment - the above comment is w.r.t to yahoo hadoop 20 version
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12427899/MAPRED-1143-7.patch
      against trunk revision 891111.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      -1 contrib tests. The patch failed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/205/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/205/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/205/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/205/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427899/MAPRED-1143-7.patch against trunk revision 891111. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/205/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/205/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/205/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/205/console This message is automatically generated.
      Hide
      rahul k singh added a comment -

      Ran TestStreamingExitStatus runs independently fine on my local box.
      Looked at the error, the failure is not related to this patch.

      Show
      rahul k singh added a comment - Ran TestStreamingExitStatus runs independently fine on my local box. Looked at the error, the failure is not related to this patch.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12427899/MAPRED-1143-7.patch
      against trunk revision 891524.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      -1 contrib tests. The patch failed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427899/MAPRED-1143-7.patch against trunk revision 891524. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/211/console This message is automatically generated.
      Hide
      rahul k singh added a comment -

      patch for 21

      Show
      rahul k singh added a comment - patch for 21
      Hide
      Hemanth Yamijala added a comment -

      +1 for the 21 patch. I will commit this.

      Show
      Hemanth Yamijala added a comment - +1 for the 21 patch. I will commit this.
      Hide
      Hemanth Yamijala added a comment -

      I committed this to trunk and branch 0.21. Thanks, Rahul !

      Show
      Hemanth Yamijala added a comment - I committed this to trunk and branch 0.21. Thanks, Rahul !
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Mapreduce-trunk #196 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/196/)

      Show
      Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #196 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/196/ )

        People

        • Assignee:
          rahul k singh
          Reporter:
          rahul k singh
        • Votes:
          0 Vote for this issue
          Watchers:
          5 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development