|
[
Permlink
| « Hide
]
Vivek Ratan added a comment - 31/Jul/08 06:29 AM
Matei brought up this issue
My guess is, we're going to have some 'core' schedulers (which seem appropriate to belong within the core Hadoop code), and some that are better suited as contrib projects. We should probably place core schedulers under src/mapred/schedulers. So, for example, the 3445 scheduler would go under src/mapred/schedulers/3445 and the 3746 scheduler under src/mapred/schedulers/3746 (replace '3445' and '3746' with more appropriate names, if you wish). Others may go in here as well, if its' felt that they're likely to be deployed in many scenarios, or for whatever other reason. Non-core schedulers can probably go under contrib.
Another option is to place core schedulers under src/core/schedulers, if we want these to be more than just MR schedulers, but folks may not write non-MR schedulers for a while. I prefer keeping stuff under mapred. Just a note, if we do subpackages, we will need a semi-public scheduler API (HADOOP-3822), because the default visibility in Java doesn't apply in subpackages. On the other hand, I think subpackages is definitely the way to go to make this scalable and clean.
+1 on subpackages (either in src/mapred or src/contrib).
Also, I think that we need to split out the MapReduce daemons into a server package (HADOOP-3916) - which we won't publish javadoc for - before we can do HADOOP-3822 properly. This argues for committing the fair scheduler Sorry to add my $0.02 so late in the process, but since you already use Torque and Condor to actually spawn the Hadoop Clusters and start jobs, have you considered adding functionality to HOD to allow external widely-used schedulers (such as the open-source Maui, Moab, PBSpro, LSF, etc. ) to control the scheduling of HOD clusters and jobs via the above-mentioned API (or the command-line or web service APIs)? This would allow sites that already have an existing scheduler and want to add the ability to run Hadoop jobs to be able to do so while taking advantage of their existing infrastructure in terms of users, SLAs, priorities, accounts, groups, etc.
Thanks for all the effort! David,
Some clarifications:
Hope that answers some of your questions. |
||||||||||||||||||||||||||||||||||||||||||||||