Issue Details (XML | Word | Printable)

Key: HADOOP-2790
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Owen O'Malley
Reporter: Owen O'Malley
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

TaskInProgress.hasSpeculativeTask is very inefficient

Created: 07/Feb/08 02:51 AM   Updated: 08/Jul/09 04:52 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 2790-2.patch 2008-02-29 07:51 AM Owen O'Malley 6 kB
Text File Licensed for inclusion in ASF works 2790.patch 2008-02-15 05:47 PM Owen O'Malley 6 kB
Issue Links:
Reference
 

Resolution Date: 02/Mar/08 08:39 AM


 Description  « Hide
Each call to JobInProgress.findNewTask can call TaskInProgress.hasSpeculativeTask once per a task. Each call to hasSpeculativeTask calls System.getCurrentTimeMillis, which can result in hundreds of thousands of calls to getCurrentTimeMillis. Additionally, it calls TaskInProgress.isOnlyCommitPending, which calls .values() on the map from task id to host name and iterates through them to see if any of the tasks are in commit pending. It would be better to have a commit pending boolean flag in the TaskInProgress. It also looks like there are other opportunities here, but those jumped out at me. We should also look at this method in the profiler.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Runping Qi added a comment - 07/Feb/08 04:54 AM

This problem may well be one of the reasons for hadoop-2119.

Also, another obvious optimization is to check whether the speculative
execution flag is true up front.


Amar Kamat added a comment - 07/Feb/08 06:57 AM

Also, another obvious optimization is to check whether the speculative execution flag is true up front.

Even I noticed that few days back. But I thought HADOOP-2141 might fix that.


With HADOOP-2119, the calls to hasSpeculative() might reduce since we are optimizing the look-ups for finding the higher priority runnable tasks and totally avoiding speculative ones in these look-ups. So the check for speculative tasks will be done only if we have nothing else to run. But +1 to do it better than making all the checks all the time.
Following are the parameters used for deciding TaskInProgress.hasSpeculative() :

  • activeTasks.size() <= MAX_TASK_EXECS [seems ok]
  • runSpeculative [should be done earlier, but ok]
  • averageProgress - progress >= SPECULATIVE_GAP [seems ok]
  • System.currentTimeMillis() - startTime >= SPECULATIVE_LAG :
    This could be checked once in TaskInProgress.recomputeProgress() and a check will only be done in hasSpeculative() if the earlier check resulted as false. I guess we can still do better but my guess is that we cant totally avoid System.currentTimeMillis() in TaskInProgress.hasSpeculative(), no?
  • completes == 0 [ok]
  • !isOnlyCommitPending() :
    May be a Map for COMMIT_PENDING tasks can be maintained in TaskInProgress and the only check made is commitPendingStatuses.size() > 0 && commitPendingStatuses.contains(taskId). The space requirement will be same with a re-arrangement to be done in TaskInProgress.recomputeProgress().

    Comments?


Amar Kamat added a comment - 07/Feb/08 08:34 AM

!isOnlyCommitPending() :
May be a Map for COMMIT_PENDING tasks can be maintained in TaskInProgress and the only check made is commitPendingStatuses.size() > 0 && commitPendingStatuses.contains(taskId). The space requirement will be same with a re-arrangement to be done in TaskInProgress.recomputeProgress().

Actually the list of task statuses will be pretty small so either we can do what is currently done or maintain a boolean flag as Owen mentioned, +1.

System.currentTimeMillis() - startTime >= SPECULATIVE_LAG

As suggested by Devaraj, the time can be calculated in JobInProgress.findNewTask() and use this value in TaskInProgress.hasSpeculative(). The chances of ignoring a TIP for speculation that should be speculated will be extremely low as compared to using the time in TaskInProgress.recomputeProgress().


Owen O'Malley added a comment - 07/Feb/08 03:40 PM

As suggested by Devaraj, the time can be calculated in JobInProgress.findNewTask() and use this value in TaskInProgress.hasSpeculative().

+1 this is the clear solution

Please do move the runSpeculative check up.

I think the boolean for the commit pending would be pretty easy. If we can avoid calling .values(), we will avoid creating a second collection for each tip. Another point is that some of our customers run with the max task failures set to 100, so it is not free to scan the tasks in a tip.


Owen O'Malley added a comment - 15/Feb/08 05:47 PM
This patch does the easy part of the improvement. I really should benchmark it before it goes in. It moves the check for speculative execution much earlier, and it does 1 get time of day.

Arun C Murthy added a comment - 25/Feb/08 10:09 PM
+1

Owen O'Malley added a comment - 29/Feb/08 01:04 AM
Before the patch, findNewTask took 60s, with the patch it took 53s in my test case using Yourkit.

Hadoop QA added a comment - 29/Feb/08 04:04 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12375694/2790.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

patch -1. The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1871/console

This message is automatically generated.


Owen O'Malley added a comment - 29/Feb/08 07:51 AM
Resolved conflict with 1985.

Hadoop QA added a comment - 29/Feb/08 10:15 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376796/2790-2.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1875/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1875/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1875/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1875/console

This message is automatically generated.


Devaraj Das added a comment - 01/Mar/08 09:07 AM
Owen, could you please submit one patch for 0.16 branch. This patch doesn't apply cleanly there.

Hudson added a comment - 01/Mar/08 12:15 PM

Owen O'Malley added a comment - 02/Mar/08 08:39 AM
Devaraj, the original patch would have worked for 0.16, but for now l decided to just put it in trunk.

Hudson added a comment - 02/Mar/08 12:08 PM