|
Arun C Murthy made changes - 03/Jun/09 12:53 AM
Arun C Murthy made changes - 03/Jun/09 12:53 AM
Very early patch.
I haven't introduced a WAITING_FOR_SLOT task state since it might be desirable for the ExpireLaunchingTasks thread to actually kill high-ram jobs which have waited too long. Thoughts?
Arun C Murthy made changes - 03/Jun/09 12:58 AM
Arun C Murthy made changes - 05/Jun/09 07:06 AM
Reasonably well tested patch for review... this patch implements queuing of high-ram tasks (at most one task is queued) at the TaskTracker. It also contains necessary fixes to the UI and introduces a new 'waiting' state for the TIP etc.
Arun C Murthy made changes - 09/Jun/09 12:46 AM
Arun C Murthy made changes - 09/Jun/09 06:54 AM
After much consideration I decided to revert back to the approach where we keep tasks on the JobTracker until sufficient memory is available, the problem with caching them on the TaskTracker is that it caused too many changes to the task status-reporting components which are currently too hairy to muck with. I'm still testing the current patch.
Arun C Murthy made changes - 15/Jun/09 08:11 AM
Reasonably well tested patch, appreciate any feedback while I finish up last round of testing.
Arun C Murthy made changes - 17/Jun/09 06:31 AM
Arun, I've started looking at this patch. It did not apply on trunk with TestCapacityScheduler failing to merge. I tried to fix it - the conflict seemed to be only in an import statement. But when I ran the test case to check whether the merge was fine, I got the following failures:
junit.framework.AssertionFailedError: null Particularly from the last test, I am hoping that its only the test case that needs fixing, because in actual, it seems like the patch has actually increased the number of used slots. smile. I will continue to look at the changes under this assumption, and get to the test cases in a bit. Also, I wanted to note that this patch is changing the TaskScheduler interface. Can we reach out explicitly to folks working on the fair scheduler and dynamic scheduler - maybe add them to the watch list of this JIRA or something ?
Some bug fixes and added counters to track how long tasks are held at the Scheduler after reserving tasktrackers...
Arun C Murthy made changes - 18/Jun/09 07:31 AM
Some notes about this patch:
I am looking at this patch as comprising of three separate parts:
I've currently done the first and partly the second part. Some comments so far: TaskTrackerStatus:
mapreduce.TaskTracker:
CapacityTaskScheduler:
JobConf:
JobInitializationPoller:
JobTracker:
Some nits:
Will continue with the review... Thanks for the review Hemanth - as you pointed out the patch needs a bit more work to remove logging etc.
I'm attaching a patch which incorporates your feedback. Some clarifications:
Fixed.
We reserve all available slots since by definition all of them are for the same task, else we wouldn't reserve if we could run right away.
Yes.
Done. I've added a JobInitializationPoller.JobInitializationContext and use that rather than the passing the scheduler.
My bad. Thanks for catching this. Fixed.
I really don't think it's a good idea to use both TaskTracker and TaskTrackerStatus in the long run, it's really hard to maintain. Which is why I bit the bullet and changed all of them.
Arun C Murthy made changes - 19/Jun/09 06:59 AM
Forgot to add that I've managed to successfully test this patch on large clusters.
I've looked at most of the code changes (excluding tests and examples). Here are a few more comments:
CapacityTaskScheduler:
JobTracker:
mapreduce.TaskTracker:
JobConf:
MemoryMatcher:
Owen O'Malley made changes - 20/Jun/09 07:59 AM
Thanks for the comments Hemanth. Here is another patch which incorporates all your comments.
Arun C Murthy made changes - 22/Jun/09 04:22 AM
Minor edit to previous version of this patch.
Arun C Murthy made changes - 22/Jun/09 06:07 AM
Updated patch, I had to fix some corner cases.
Arun C Murthy made changes - 23/Jun/09 07:21 AM
This is a patch that fixes failing capacity scheduler tests. Summary of changes:
With these changes, TestCapacityScheduler passes in my local machine.
Hemanth Yamijala made changes - 23/Jun/09 05:00 PM
Updated and hopefully final patch. Passes test cases and 'ant test-patch':
[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 30 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
Arun C Murthy made changes - 24/Jun/09 06:15 AM
Arun C Murthy made changes - 24/Jun/09 06:16 AM
I looked at the last patch. It seems fine to me, except for one small problem I'd commented on earlier. In JobConf.compute.. methods, we should check if any of the memory parameters are not defined and then return 1, otherwise, we could end up computing negative values. The updated patch has only this one change.
Since currently this API is used only by CapacityTaskScheduler, I ran the relevant test cases in CS, and they passed. Results of test-patch output: [exec] -1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 30 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
I checked with Giri on the -1 on eclipse classpath, and he told me that this could be ignored, because it doesn't run very well with Ivy. Based on this, I will commit the patch.
Hemanth Yamijala made changes - 24/Jun/09 02:09 PM
Hemanth, the JobConf.compute* methods do not have access to the memory parameters since they are in the JobTracker/Scheduler's conf?
Never mind - your changes seem fine.
The only concern is that the patch you just uploaded seems much smaller (168K) than the one I uploaded last night (174K). Can you please check? Arun and I checked the difference in file sizes. The difference is that Arun was using git and I was using SVN and it seems SVN generated patches are smaller. The number of modified files in the patch also matches. So, I think it is OK.
I just committed this. Thanks, Arun !
Hemanth Yamijala made changes - 24/Jun/09 02:43 PM
Patch for yahoo 0.20 branch.
Arun C Murthy made changes - 24/Jun/09 02:48 PM
Attaching latest patch for internal Y! 20, modification to JobConf as per latest Hemanths Patch.
Also, ran TestCapacityScheduler to check if the test passes successfully.
Sreekanth Ramakrishnan made changes - 25/Jun/09 10:22 AM
Updated to reflect changes to yhadoop-0.20.
Arun C Murthy made changes - 29/Jun/09 10:17 PM
Forgot to do "--no-prefix". Fixed now.
Arun C Murthy made changes - 30/Jun/09 12:17 AM
Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/
Attaching Yahoo! distribution patch.
Sreekanth Ramakrishnan made changes - 19/Aug/09 08:29 AM
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Once the accounting for reserved slots is fixed, it would automatically ensure that a HighRAMJob can only reserve slots upto the quota of the queue it belongs to. Hence the next enhancement is to pick specific slots and hold them rather than hold slots on every TaskTracker.
Picking slots for High RAM Jobs
The key to better support for HighRAMJobs is to reserve slots on specific TaskTracker. Of course one could get arbitrarily clever while picking slots, factors to be considered are:
For the first cut, I'd propose we consider only locality and not expected time. Once we fix speculative execution (
HADOOP-2141), we will more of the necessary features to predict expected time etc., hence the pushback.Accounting for Reserved Slots
It is critical that we charge the queues of the HighRAMJobs when we hold reserved slots for them to ensure that they stay under their capacity and can't runaway with slots in the cluster. The proposal is to charge jobs/queues immediately when we reserve slots on a TaskTracker (when it can't be immediately run).
Metering
While metering HighRAMJobs, it would be incorrect to meter jobs (slot-hours etc.) by equating reserved slots to running slots. The proposal is to meter HighRAMJobs for open-but-held slots and running slots. (Open but held slots are those which are free on the TaskTracker but are being held while more become available for the HighRAMJob's tasks.)
Notes on Implementation and Challenges
As discussed above the proposal is to consider just data-locality while reserving slots. Assuming this, there are a couple of implementation choices once we reserved the slot:
Proposal 1
Here we would introduce a queue of ready to run tasks at the TaskTracker and fill it in with the task of the HighRAMJobs.
Pros
Cons
Proposal 2
Here we would start marking slots as reserved (per task per job) and maintain information to assign the slot to the task when it eventually does free up.
Pros
Cons
Recommendation
User Interface
It is important for users (and queue-admins) to understand that there are slots which are reserved for HighRAMJobs which result in lower running maps/reduces w.r.t the queue-capacities. It would be nice to add reserved slots to the JobTracker/Job UI, and also to the Queue-Info in the Scheduler page.