|
For those interested, here is a preliminary patch that implements the fair scheduler and works with the 9.2 patch for
The only caveat with this patch is that after you build it, you have to set HADOOP_CLASSPATH to include build/contrib/poolscheduler/classes in hadoop-env.sh because the classes get placed there. Also, you must create a pool config file (say conf/pools) and set the jobconf variable "facebook.scheduler.allocation.file" to point to it. This file can be empty, or it can contain lines of the format <poolName> <minMappers> <minReducers> to specify minimum allocations for various pools. The patch supports fair sharing with weights by priority (every priority level gets 2x the weight of the previous level) and pools with minimum numbers of reducers and mappers. The pool config file is reloaded every 10 seconds so allocations can be changed while a cluster is running. The patch also includes some general tools that might be useful to schedulers, such as JobSelector and its subclasses, and the static methods in JobUtils. Vivek - You're right, JIRAs like this one and
Some of the code here might be useful for At the same time, I want to add a few extensibility points to this scheduler. First of all, I want to add a way to extend the weight and deficit calculations, perhaps by providing multiple Adjuster classes that can be chained together. This could be used for example to boost priority of new jobs during their first few minutes (reducing response times for interactive queries), to take into account locality when deciding which job to assign to each slot, etc. Second, I've already implemented an interface called LoadManager that takes care of how man tasks should run on each taskTracker. This currently uses the caps, but an alternate implementation that we might try is to assign caps based on load (start more tasks on nodes where the running tasks are not utilizing all of the CPU, bandwidth and memory). The scheduler is also pretty modular and it's easy to change or reuse particular components, like the JobSelectors. Here is an updated patch which simplifies the logic by putting it into FairSharingJobSelector and also fixes a bug in the previous calculation of fair share (it didn't take into account minimums, so jobs would consider themselves "starved" if some job with a large allocation was running). Once
linking to
Here is a new patch incorporating a lot of improvements. There highlights are bug fixes to the fair sharing calculations, support for speculative execution (using the API in https://issues.apache.org/jira/browse/HADOOP-3840
Almost all the code is in contrib, but there is one change in mapred to make the activeTasks variable in TaskInProgress accessible to other classes in the same package - this is necessary for properly counting active tasks in each job. The code is more or less usable as is. A few things I want to add include unit tests and support on the web UI for changing jobs' priorities and pools, and a README file about the config options. GIven that most of the changes are in contrib, what's will be the process for getting this reviewed and committed once it is testable? I also have another question for the folks watching this issue. So far I've organized my patch to have my code in contrib, but would it be better to have it in src/java/mapred instead? I think this would have two advantages. First, other schedulers, such as
I've been thinking of this as well, and have had a few discussions with folks. I've opened a separate Jira (HADOOP-3876). we should discuss options there. Sound good?
New patch that includes unit tests, formatting according to Hadoop code style, and more complete documentation.
New patch that includes unit tests, formatting according to Hadoop code standards, and documentation.
Cool. The UI looks great.
1. Can the scheduler look for modification time of the config file to decide whether to read it in or not? -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12387391/fairscheduler-v5.patch against trunk revision 682978. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 216 release audit warnings (more than the trunk's current 215 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3005/testReport/ This message is automatically generated. This is looking good, but I'm worried about a hand full of schedulers all putting a bunch of classes into the mapred package. I think we either need to put the schedulers into packages or put them into contrib. Thoughts?
Hi Owen,
It's definitely possible to put the schedulers in different classes New patch with one additional feature - ability to set jobs' weights based on their size (so bigger jobs get bigger shares).
Also, I don't understand the release audit warning; what does it mean? Matei, the release audit warning usually means one of the files in the patch doesn't have the ASF license header. Typically this is the case if you have new configuration files. If that's the case, it can be ignored. I just add a comment to the JIRA explaining the warning is due to this reason.
You might have some files with missing licenses... In this case though, it looks like TestFairScheduler doesn't have the header. So, you'll need to add that. Details were found here: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3005/artifact/trunk/current/releaseAuditDiffWarnings.txt
Thanks for the clarification, I'll put it in the next version.
> I'm worried about a hand full of schedulers all putting a bunch of classes into the mapred package.
+1 > I think we either need to put the schedulers into packages or put them into contrib. I would vote to commit this into contrib. If we wanted to move it into core at a later date we should put it in its own package, but that really needs HADOOP-3916 and HADOOP-3822 to be done first. A few comments:
for (TaskType taskType: EnumSet.allOf(TaskType.class)) {
}
Thanks for your comments, Tom! Here's a new patch, including fixed license headers, a build file, no scheduler singleton and a detailed ReadMe.
Regarding the enum values - I've actually used for (TaskType type: TaskType.values()) to iterate through them in some places, but in assignTasks in particular, I wanted to make sure I look at maps first and then reduces, because I want to assign maps before reduces. Although values() / allOf() probably give the values in the order they're defined in the enum, I wanted to make it explicit. Here is hopefully the last patch in terms of features. This one adds support for limiting the number of running jobs in from each queue and from each user, similar to the behavior specified in HADOOP-3421. This is useful for clusters where many jobs are submitted at once by the same entity and we wish to limit how many can run at the same time to improve performance (by having each job finish faster, using up less temporary storage, etc). This patch also changes the config file format to XML so that more types of configuration elements can be included. There are three new unit tests for this functionality and an example config file in the ReadMe.
+1
This looks good to me. The README is excellent. > Although values() / allOf() probably give the values in the order they're defined in the enum, I wanted to make it explicit. Fair enough - there's not much chance of a new value being added here, so it's fine. (Both the values() method and the EnumSet#allOf() method guarantee to return the values in the order they are defined in the enum. See http://java.sun.com/docs/books/jls/third_edition/html/classes.html#302265 -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12388201/fairscheduler-v6.patch against trunk revision 686362. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3062/testReport/ This message is automatically generated. Resubmitting patch because auto-tester seemed to be broken.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12388201/fairscheduler-v6.patch against trunk revision 687868. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3079/testReport/ This message is automatically generated. I think this patch should be committed. We have been running this internally for the last few weeks without any problems. The unit test failures are not caused by this patch.
I just committed this. Thanks, Matei!
Thanks Owen! When
TestFairScheduler failed on Linux. See
Matei, we are working on a version of Integrated in Hadoop-trunk #589 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/589/
I can't get the patch to apply to 0.18.1, has anyone tried this? It seems like an ant build config problem. Any help would be greatly appreciated.
Hi Eric,
Are you trying to use the patch attached to this JIRA? This patch is for 0.19, which introduced API support for plugging in different schedulers (see For those interested, here is a patch for 0.18.1, which includes
The 0.18.1 patch should also work with 0.18.2, but doesn't work with 0.18.3. I've attached a patch for 0.18.3.
(I had previously created this 0.18.3 patch for Cloudera by the way, so don't assume that I hacked it together in 10 minutes and it's untested! It required a small code change since the JobTracker code had changed enough to prevent the old patch from applying.)
The 0.18.3 patch attached to this ticket introduces a race condition (documented in
Attached are a delta since the previous patch as well as a new patch that incorporates both. I ran some MiniMR test cases and they passed, though I have not run the full suite.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I just wanted to add:
HADOOP-3445supports user limits (a single user can only use up to a proportion of the queue), which, we think, will prevent starvation (when combined with capacities for queues). This feature should also solve HADOOP-2573.It'll be really interesting to see how these two schedulers handle user problems today, such as starvation of jobs, or fairness problems. I think there're a lot of similarities in the approaches, but perhaps enough differences that will manifest in different ways on a cluster. Hopefully, once we have patches out for both Jiras, and enough experience using them, we can find some common solutions to common problems with Hadoop scheduling. One of our goals, driven by the work for 3412, is to make the scheduler for 3421 extensible, so folks can tweak algorithms at different levels and find what works best for them, as well as enhance the default algorithm. I hope that the same is possible with your design - it'll be a very useful feature.
My real point is, between
HADOOP-3412, HADOOP-3421 (and its implementation offshoots, includingHADOOP-3445), andHADOOP-3746, as well as other Jiras these might influence, there's a lot of good work going on now in dealing with Hadoop scheduling and looking at effective ways to improve the utilization and usability of Hadoop. It's about time! We need this effort.