Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4499

Looking for speculative tasks is very expensive in 1.x

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.3
    • Fix Version/s: 1.2.0
    • Component/s: mrv1, performance
    • Labels:
      None
    • Target Version/s:

      Description

      When there are lots of jobs and tasks active in a cluster, the process of figuring out whether or not to launch a speculative task becomes very expensive.

      I could be missing something but it certainly looks like on every heartbeat we could be scanning 10's of thousands of tasks looking for something which might need to be speculatively executed. In most cases, nothing gets chosen so we completely trashed our data cache and didn't even find a task to schedule, just to do it all over again on the next heartbeat.

      On busy jobtrackers, the following backtrace is very common:

      "IPC Server handler 32 on 50300" daemon prio=10 tid=0x00002ab36c74f800
      nid=0xb50 runnable [0x0000000045adb000]
      java.lang.Thread.State: RUNNABLE
      at java.util.TreeMap.valEquals(TreeMap.java:1182)
      at java.util.TreeMap.containsValue(TreeMap.java:227)
      at java.util.TreeMap$Values.contains(TreeMap.java:940)
      at
      org.apache.hadoop.mapred.TaskInProgress.hasRunOnMachine(TaskInProgress.java:1072)
      at
      org.apache.hadoop.mapred.JobInProgress.findSpeculativeTask(JobInProgress.java:2193)

      • locked <0x00002aaefde82338> (a
        org.apache.hadoop.mapred.JobInProgress)
        at
        org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2417)
      • locked <0x00002aaefde82338> (a
        org.apache.hadoop.mapred.JobInProgress)
        at
        org.apache.hadoop.mapred.JobInProgress.obtainNewNonLocalMapTask(JobInProgress.java:1432)
      • locked <0x00002aaefde82338> (a
        org.apache.hadoop.mapred.JobInProgress)
        at
        org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:525)
        at
        org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:322)
        at
        org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:419)
        at
        org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:150)
        at
        org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTasks(CapacityTaskScheduler.java:1075)
        at
        org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1044)
      • locked <0x00002aab6e27a4c8> (a
        org.apache.hadoop.mapred.CapacityTaskScheduler)
        at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3398)
      • locked <0x00002aab6e191278> (a org.apache.hadoop.mapred.JobTracker)
        ...)

        Issue Links

          Activity

          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop 1.2.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.
          Hide
          Thomas Graves added a comment -

          thanks Koji, I've committed this to branch-1.

          Show
          Thomas Graves added a comment - thanks Koji, I've committed this to branch-1.
          Hide
          Thomas Graves added a comment -

          +1 looks good. findbugs exist on branch-1 without this change.

          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
          [exec] Please justify why no tests are needed for this patch.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings.
          [exec]

          Show
          Thomas Graves added a comment - +1 looks good. findbugs exist on branch-1 without this change. [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. [exec]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541264/mapreduce-4499-v1.0.2-1.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2785//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541264/mapreduce-4499-v1.0.2-1.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2785//console This message is automatically generated.
          Hide
          Nathan Roberts added a comment -

          Ideally we'd be much less aggressive about calling findSpeculativeTask() (2 million calls over a few seconds is a bit much). However, this patch looks to be a safe way to mitigate the problem by avoiding the seemingly expensive call to hasRunOnMachine(). I double checked the truth table against the pseudo-code and the patch and they look good to me.

          Show
          Nathan Roberts added a comment - Ideally we'd be much less aggressive about calling findSpeculativeTask() (2 million calls over a few seconds is a bit much). However, this patch looks to be a safe way to mitigate the problem by avoiding the seemingly expensive call to hasRunOnMachine(). I double checked the truth table against the pseudo-code and the patch and they look good to me.
          Hide
          Koji Noguchi added a comment -

          Attaching a patch with if&else rewrite. Trying to change the order of boolean condition but not changing the logic.

          Show
          Koji Noguchi added a comment - Attaching a patch with if&else rewrite. Trying to change the order of boolean condition but not changing the logic.
          Hide
          Koji Noguchi added a comment -

          Looked at one of the busy JobTrackers. Attached btrace for couple of secs and counted the booleans.

          Out of 2093791 JobInProgress.findSpeculativeTask calls, 2437 of them had shouldRemove=true.
          Out of 2213670 TaskInProgress.hasSpeculativeTask calls, 137 of them were 'true'.

          Of course these numbers differ from cluster to cluster, but I believe it shows the possibility of some savings.

          Show
          Koji Noguchi added a comment - Looked at one of the busy JobTrackers. Attached btrace for couple of secs and counted the booleans. Out of 2093791 JobInProgress.findSpeculativeTask calls, 2437 of them had shouldRemove=true. Out of 2213670 TaskInProgress.hasSpeculativeTask calls, 137 of them were 'true'. Of course these numbers differ from cluster to cluster, but I believe it shows the possibility of some savings.
          Hide
          Koji Noguchi added a comment -

          Assuming the cost of tip.hasRunOnMachine is expensive, we can try reordering the if/else so that we call it less often.

          From JobInProgress.java

          2196       if (!tip.hasRunOnMachine(ttStatus.getHost(),
          2197                                ttStatus.getTrackerName())) {
          2198         if (tip.hasSpeculativeTask(currentTime, avgProgress)) {
          2199           // In case of shared list we don't remove it. Since the TIP failed 
          2200           // on this tracker can be scheduled on some other tracker.
          2201           if (shouldRemove) {
          2202             iter.remove(); //this tracker is never going to run it again
          2203           }
          2204           return tip;
          2205         }
          2206       } else {
          2207         // Check if this tip can be removed from the list.
          2208         // If the list is shared then we should not remove.
          2209         if (shouldRemove) {
          2210           // This tracker will never speculate this tip
          2211           iter.remove();
          2212         }
          2213       }
          2214     }
          

          Checking the action for each true&false.

          tip.hasRun    tip.hasSpeculative      shouldRemove       Action
             F                 F                     F             -
             F                 F                     T             -
             F                 T                     F             return tip
             F                 T                     T             iter.remove() & return tip;
             T                 F                     F             -
             T                 F                     T             iter.remove()
             T                 T                     F             -
             T                 T                     T             iter.remove()
          

          Can we rewrite the above logic to

          if (tip.hasSpeculative) { 
            if(shouldRemove){
              iter.remove();
            } 
            if(!tip.hasRun) { 
              return tip;
            } 
          } else { 
            if (shouldRemove && tip.hasRun ){
              iter.remove();
            } 
          } 
          

          From the jstack we see, I can tell that shouldRemove is often 'false' in our case. Depending on the value of tip.hasSpeculative, we may reduce the tip.hasRun calls with this rewrite.
          (I don't know how often 'hasSpeculative' is true.)

          Show
          Koji Noguchi added a comment - Assuming the cost of tip.hasRunOnMachine is expensive, we can try reordering the if/else so that we call it less often. From JobInProgress.java 2196 if (!tip.hasRunOnMachine(ttStatus.getHost(), 2197 ttStatus.getTrackerName())) { 2198 if (tip.hasSpeculativeTask(currentTime, avgProgress)) { 2199 // In case of shared list we don't remove it. Since the TIP failed 2200 // on this tracker can be scheduled on some other tracker. 2201 if (shouldRemove) { 2202 iter.remove(); //this tracker is never going to run it again 2203 } 2204 return tip; 2205 } 2206 } else { 2207 // Check if this tip can be removed from the list. 2208 // If the list is shared then we should not remove. 2209 if (shouldRemove) { 2210 // This tracker will never speculate this tip 2211 iter.remove(); 2212 } 2213 } 2214 } Checking the action for each true&false. tip.hasRun tip.hasSpeculative shouldRemove Action F F F - F F T - F T F return tip F T T iter.remove() & return tip; T F F - T F T iter.remove() T T F - T T T iter.remove() Can we rewrite the above logic to if (tip.hasSpeculative) { if(shouldRemove){ iter.remove(); } if(!tip.hasRun) { return tip; } } else { if (shouldRemove && tip.hasRun ){ iter.remove(); } } From the jstack we see, I can tell that shouldRemove is often 'false' in our case. Depending on the value of tip.hasSpeculative, we may reduce the tip.hasRun calls with this rewrite. (I don't know how often 'hasSpeculative' is true.)

            People

            • Assignee:
              Koji Noguchi
              Reporter:
              Nathan Roberts
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development