Issue Details (XML | Word | Printable)

Key: HADOOP-5170
Type: New Feature New Feature
Status: Resolved Resolved
Resolution: Won't Fix
Priority: Major Major
Assignee: Matei Zaharia
Reporter: Jonathan Gray
Votes: 9
Watchers: 27
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide

Created: 04/Feb/09 09:23 PM   Updated: 08/Oct/09 06:41 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 0.21.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works h5170.patch 2009-07-07 08:37 PM Owen O'Malley 16 kB
Text File Licensed for inclusion in ASF works HADOOP-5170-tasklimits-v3-0.18.3.patch 2009-05-29 07:17 AM Todd Lipcon 22 kB
Text File Licensed for inclusion in ASF works tasklimits-v2.patch 2009-05-16 10:12 PM Matei Zaharia 6 kB
Text File Licensed for inclusion in ASF works tasklimits-v3-0.19.patch 2009-05-27 02:58 AM Jonathan Gray 6 kB
Text File Licensed for inclusion in ASF works tasklimits-v3.patch 2009-05-21 09:59 PM Matei Zaharia 16 kB
Text File Licensed for inclusion in ASF works tasklimits-v4-20.patch 2009-07-01 04:17 AM rahul k singh 15 kB
Text File Licensed for inclusion in ASF works tasklimits-v4.patch 2009-06-01 05:47 PM Matei Zaharia 15 kB
Text File Licensed for inclusion in ASF works tasklimits.patch 2009-05-01 11:55 PM Matei Zaharia 3 kB

Hadoop Flags: Reviewed
Release Note: Job tracker parameters permit setting limits on the number of maps (or reduces) per job and/or per node.
Resolution Date: 09/Jul/09 09:21 PM


 Description  « Hide
There are a number of use cases for being able to do this. The focus of this jira should be on finding what would be the simplest to implement that would satisfy the most use cases.

This could be implemented as either a per-node maximum or a cluster-wide maximum. It seems that for most uses, the former is preferable however either would fulfill the requirements of this jira.

Some of the reasons for allowing this feature (mine and from others on list):

  • I have some very large CPU-bound jobs. I am forced to keep the max map/node limit at 2 or 3 (on a 4 core node) so that I do not starve the Datanode and Regionserver. I have other jobs that are network latency bound and would like to be able to run high numbers of them concurrently on each node. Though I can thread some jobs, there are some use cases that are difficult to thread (scanning from hbase) and there's significant complexity added to the job rather than letting hadoop handle the concurrency.
  • Poor assignment of tasks to nodes creates some situations where you have multiple reducers on a single node but other nodes that received none. A limit of 1 reducer per node for that job would prevent that from happening. (only works with per-node limit)
  • Poor mans MR job virtualization. Since we can limit a jobs resources, this gives much more control in allocating and dividing up resources of a large cluster. (makes most sense w/ cluster-wide limit)


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Repository Revision Date User Message
ASF #781683 Thu Jun 04 08:34:00 UTC 2009 ddas HADOOP-5170. Allows jobs to set max maps/reduces per-node and per-cluster. Contributed by Matei Zaharia.
Files Changed
MODIFY /hadoop/core/trunk/src/mapred/mapred-default.xml
ADD /hadoop/core/trunk/src/test/mapred/org/apache/hadoop/mapred/TestRunningTaskLimits.java
MODIFY /hadoop/core/trunk/CHANGES.txt
MODIFY /hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/JobConf.java
MODIFY /hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/JobInProgress.java

Repository Revision Date User Message
ASF #792700 Thu Jul 09 21:16:07 UTC 2009 omalley HADOOP-5170. Reverting patch.
Files Changed
MODIFY /hadoop/mapreduce/trunk/src/java/mapred-default.xml
MODIFY /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestRunningTaskLimits.java
MODIFY /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobConf.java
MODIFY /hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapred/JobInProgress.java
MODIFY /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestRackAwareTaskPlacement.java

Repository Revision Date User Message
ASF #792701 Thu Jul 09 21:17:22 UTC 2009 omalley HADOOP-5170. Removed change log entry because HADOOP-5170 was reverted from
mapreduce.
Files Changed
MODIFY /hadoop/common/trunk/CHANGES.txt

Repository Revision Date User Message
ASF #792704 Thu Jul 09 21:20:19 UTC 2009 omalley HADOOP-5170. Delete java file that was emptied by reverting patch. Also
removing two other empty java files.
Files Changed
DEL /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestRunningTaskLimits.java
DEL /hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/join/ConfigurableInputFormat.java
DEL /hadoop/mapreduce/trunk/src/contrib/streaming/src/test/org/apache/hadoop/streaming/CatApp.java