Type: New Feature
Affects Version/s: None
Fix Version/s: None
Hadoop On Demand (HOD) is an integration of Hadoop with batch schedulers like Condor/torque/sun grid etc. Hadoop On Demand or HOD hereafter is a system that populates a Hadoop instance using a shared batch scheduler. HOD will find a requested number of nodes and start up Hadoop daemons on them. Users map reduce jobs can then run on the hadoop instance. After the job is done, HOD gives back the nodes to the shared batch scheduler. A group of users will use HOD to acquire Hadoop instances of varying sizes and the batch scheduler will schedule requests in a way that important jobs gain more importance/resources and finish fast. Here are a list of requirements for HOD and batch schedulers:
Key Requirements :
— Should allocate the specified minimum number of nodes for a job
Many batch jobs can finish in time, only when enough resources are allocated. Therefore batch scheduler should allocate the asked number of nodes for a given job when the job starts. This is simple form of what's known as gang scheduling.
Often the minimum nodes are not available right away, especially if the job asked for a large number. The batch scheduler should support advance reservation for important jobs so that the wait time can be determined. In advance reservation, a reservation is created on earliest future point when the preoccupied nodes become available. When nodes are currently idle but booked by future reservations, batch scheduler is ok to give them to other jobs to increase system utilization, but only when doing so does not delay existing reservations.
— run short urgent job without costing too much loss to long job. Especially, should not kill job tracker of long job.
Some jobs, mostly short ones, are time sensitive and need urgent treatment. Often, large portion of cluster nodes will be occupied by long running jobs. Batch scheduler should be able to preempt long jobs and run urgent jobs. Then, urgent jobs will finish quickly and long jobs can re-gain the nodes afterward.
When preemption happens, HOD should minimize the loss to long jobs. Especially, it should not kill job tracker of long job.
— be able to dial up, at run time, share of resources for more important projects.
Viewed at high level, a given cluster is shared by multiple projects. A project consists of a number of jobs submitted by a group of users.Batch scheduler should allow important projects to have more resources. This should be tunable at run time as what projects deem more important may change over time.
— prevent malicious abuse of the system.
A shared cluster environment can be put in jeopardy if malicious or erroneous job code does:
– hold unneeded resources for a long period
– use privileges for unworthy work
Such abuse can easily cause under-utilization or starvation of other jobs. Batch scheduler should allow setting up policies for preventing resource abuse by:
– limit privileges to legitimate uses asking for proper amount
– throttle peak use of resources per player
– monitor and reduce starvation
— The behavior should be simple and predictable
When status of the system is queried, we should be able to determine what factors caused it to reach current status and what could be the future behavior with or without our tuning on the system.
— be portable to major resource managers
HOD design should be portable so that in future we are able to plugin other resource manager.
Some of the key requirements are implemented by the batch schedulers. The others need to be implemented by HOD.