|
Christian Kunz made changes - 26/Sep/08 07:57 PM
The situation becomes more complicated when some applications in a batch are pipes applications, some are not. Among pipes applications some might produce a large amount of data to shuffle requiring the java tasks to sort intensively, some not.
In summary, the mapping of number of cores to mapred.map.tasks.maximum and mapred.reduce.tasks.maximum is not always straight forward. I'm not arguing that these are perfect, but permitting them to vary per node is a feature that we shouldn't toss out. Adding a different parameter that limits the number of tasks that a job would actually run simultaneously on a node might be reasonable. Thus I think extending the scheduler, as is done in
Okay, my bad. I went too far by requesting to move the configuration parameters to job-level instead of just adding job-level control
Should the title be changed to something like
I am not sure if Doug was suggesting we use That said, I also think we'll need to consider unifying mechanisms of resource management at some time (maybe in the near future, smile). We already seem to have slightly different ways of dealing with cores, memory, and disk (a.k.a
This was my understanding as well.
The sooner, the better, smile. Currently one has to restart the framework when mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum get changed. Then, may be, similar to the configuration knob mapred.tasks.maxmemory w.r.t memory, we can have mapred.job.{map|reduce}.tasks to specify number of tasks a job occupies; while mapred.tasktracker.tasks.maxmemory maps to mapred.tasktracker.{map|reduce}.tasks.maximium. After that, similar to how
Notes:
I talked with Sameer offline and we agreed to use a work-around based on the scheduler till a more general solution for resource monitoring and utilization is available.
Christian Kunz made changes - 10/Nov/08 11:06 PM
Nigel Daley made changes - 20/Nov/08 11:20 PM
Owen O'Malley made changes - 08/Jul/09 04:53 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HADOOP-2765andHADOOP-4035can be used to control things on a per-job basis.