Hadoop Common
  1. Hadoop Common
  2. HADOOP-5170

Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Job tracker parameters permit setting limits on the number of maps (or reduces) per job and/or per node.

      Description

      There are a number of use cases for being able to do this. The focus of this jira should be on finding what would be the simplest to implement that would satisfy the most use cases.

      This could be implemented as either a per-node maximum or a cluster-wide maximum. It seems that for most uses, the former is preferable however either would fulfill the requirements of this jira.

      Some of the reasons for allowing this feature (mine and from others on list):

      • I have some very large CPU-bound jobs. I am forced to keep the max map/node limit at 2 or 3 (on a 4 core node) so that I do not starve the Datanode and Regionserver. I have other jobs that are network latency bound and would like to be able to run high numbers of them concurrently on each node. Though I can thread some jobs, there are some use cases that are difficult to thread (scanning from hbase) and there's significant complexity added to the job rather than letting hadoop handle the concurrency.
      • Poor assignment of tasks to nodes creates some situations where you have multiple reducers on a single node but other nodes that received none. A limit of 1 reducer per node for that job would prevent that from happening. (only works with per-node limit)
      • Poor mans MR job virtualization. Since we can limit a jobs resources, this gives much more control in allocating and dividing up resources of a large cluster. (makes most sense w/ cluster-wide limit)
      1. tasklimits.patch
        3 kB
        Matei Zaharia
      2. tasklimits-v2.patch
        6 kB
        Matei Zaharia
      3. tasklimits-v3.patch
        16 kB
        Matei Zaharia
      4. tasklimits-v3-0.19.patch
        6 kB
        Jonathan Gray
      5. HADOOP-5170-tasklimits-v3-0.18.3.patch
        22 kB
        Todd Lipcon
      6. tasklimits-v4.patch
        15 kB
        Matei Zaharia
      7. tasklimits-v4-20.patch
        15 kB
        rahul k singh
      8. h5170.patch
        16 kB
        Owen O'Malley

        Issue Links

          Activity

          Hide
          Matei Zaharia added a comment -

          Robert, this patch has been reverted, so don't include the release note.

          Show
          Matei Zaharia added a comment - Robert, this patch has been reverted, so don't include the release note.
          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #22 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/22/)
          . Removed change log entry because was reverted from
          mapreduce.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #22 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/22/ ) . Removed change log entry because was reverted from mapreduce.
          Hide
          Owen O'Malley added a comment -

          I reverted this. Matei may move this into the Fair Share Scheduler.

          Show
          Owen O'Malley added a comment - I reverted this. Matei may move this into the Fair Share Scheduler.
          Hide
          Arun C Murthy added a comment -

          +1

          Show
          Arun C Murthy added a comment - +1
          Hide
          Owen O'Malley added a comment -

          This is the patch to rollback the change.

          Show
          Owen O'Malley added a comment - This is the patch to rollback the change.
          Hide
          Devaraj Das added a comment -

          Thought about it and i buy the argument that these knobs should not have been added in the core framework. So +1 for reverting this patch.

          Show
          Devaraj Das added a comment - Thought about it and i buy the argument that these knobs should not have been added in the core framework. So +1 for reverting this patch.
          Hide
          Jonathan Gray added a comment -

          Makes sense. Thanks Matei.

          Show
          Jonathan Gray added a comment - Makes sense. Thanks Matei.
          Hide
          Matei Zaharia added a comment -

          So in other words, to answer your second question, limiting one job to one task/node will be done as in the current patch (with mapred.max.maps.per.node).

          Show
          Matei Zaharia added a comment - So in other words, to answer your second question, limiting one job to one task/node will be done as in the current patch (with mapred.max.maps.per.node).
          Hide
          Matei Zaharia added a comment -

          No, pools are persistent. You submit a job to a particular pool by setting a jobconf property (e.g. set pool.name="my_pool"). Then you'll be able to have caps on total maps or total reduces running for each pool. For example, you could limit your DB import pool to 20 mappers, and then all DB import jobs together will get no more than 10 mappers.

          I was planning to make the per-node limits be on a per job basis as in the current patch. However, for the per-job limits, it seemed to make more sense to let them apply across multiple jobs by placing them on a pool.

          Show
          Matei Zaharia added a comment - No, pools are persistent. You submit a job to a particular pool by setting a jobconf property (e.g. set pool.name="my_pool"). Then you'll be able to have caps on total maps or total reduces running for each pool. For example, you could limit your DB import pool to 20 mappers, and then all DB import jobs together will get no more than 10 mappers. I was planning to make the per-node limits be on a per job basis as in the current patch. However, for the per-job limits, it seemed to make more sense to let them apply across multiple jobs by placing them on a pool.
          Hide
          Jonathan Gray added a comment -

          So pooling is just a one-time thing when I submit the job? It's not something that persists and I submit things into?

          I'm a big consumer of MR but have been on a need-to-know basis with respect to these things. I guess I now need to know. Again, part of what I liked about this issue/solution was that it's powerful, accessible, and easy to understand. I understand the concerns of larger users and the need to support this... And I would again ask if we could stick it into a corner somewhere so that it's still easy to access but does not get in the way of everything else.

          Otherwise, what I'd be interested in is an explanation / example of how users of this patch might accomplish the same types of things. For example, only allowing a particular job to use one task per node (or even total tasks at a time = total nodes). And at the same time, having other jobs that I allow 10s of tasks per node. I'm not following how that would work.

          Show
          Jonathan Gray added a comment - So pooling is just a one-time thing when I submit the job? It's not something that persists and I submit things into? I'm a big consumer of MR but have been on a need-to-know basis with respect to these things. I guess I now need to know. Again, part of what I liked about this issue/solution was that it's powerful, accessible, and easy to understand. I understand the concerns of larger users and the need to support this... And I would again ask if we could stick it into a corner somewhere so that it's still easy to access but does not get in the way of everything else. Otherwise, what I'd be interested in is an explanation / example of how users of this patch might accomplish the same types of things. For example, only allowing a particular job to use one task per node (or even total tasks at a time = total nodes). And at the same time, having other jobs that I allow 10s of tasks per node. I'm not following how that would work.
          Hide
          Matei Zaharia added a comment -

          You'd be able to place a limit on the whole pool - for example, 100 maps. If you submit one job at a time to the pool, then that job will get the whole limit. However, if you submit two jobs, they will share it, and they will get 100 maps in total, not 100 each. Right now the jobs will do fair sharing, meaning that each job will get 50 maps. However, FIFO scheduling within a pool will also be supported by the fair scheduler in the future (I have a fairly well-tested patch for it that I am porting to trunk).

          Show
          Matei Zaharia added a comment - You'd be able to place a limit on the whole pool - for example, 100 maps. If you submit one job at a time to the pool, then that job will get the whole limit. However, if you submit two jobs, they will share it, and they will get 100 maps in total, not 100 each. Right now the jobs will do fair sharing, meaning that each job will get 50 maps. However, FIFO scheduling within a pool will also be supported by the fair scheduler in the future (I have a fairly well-tested patch for it that I am porting to trunk).
          Hide
          Jonathan Gray added a comment -

          I can just put all my jobs into a single pool, and have the same functionality I would have now, correct?

          Show
          Jonathan Gray added a comment - I can just put all my jobs into a single pool, and have the same functionality I would have now, correct?
          Hide
          Matei Zaharia added a comment -

          I've opened MAPREDUCE-704 to add per-node task limits in the fair scheduler. In addition, MAPREDUCE-698 is for per-pool task limits. Are people alright with having per-pool limits instead of per-job limits? Pools allow you to group multiple jobs under the same limit.

          Show
          Matei Zaharia added a comment - I've opened MAPREDUCE-704 to add per-node task limits in the fair scheduler. In addition, MAPREDUCE-698 is for per-pool task limits. Are people alright with having per-pool limits instead of per-job limits? Pools allow you to group multiple jobs under the same limit.
          Hide
          Arun C Murthy added a comment -

          I think we need to take a step back.

          The contention is that the specific knobs in question have long-term ramifications. In particular putting these in the framework proper, as opposed to specific schedulers, impose constraints on all schedulers regardless. We are happy to have specific schedulers targeted for specific use-cases/workloads support them, but we are opposed to allowing features which jeopardize one over the other, especially in the rather obvious way as we've highlighted above.

          In the past, and possibly in the future, we will make some choices which hurt use-case one over the other; in which case we will almost always be biased towards large, multi-user clusters.


          Having said that, there are some workarounds even with the CapacityScheduler (https://issues.apache.org/jira/browse/HADOOP-5170?focusedCommentId=12726751&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12726751) if people are so inclined.

          Show
          Arun C Murthy added a comment - I think we need to take a step back. The contention is that the specific knobs in question have long-term ramifications. In particular putting these in the framework proper, as opposed to specific schedulers, impose constraints on all schedulers regardless. We are happy to have specific schedulers targeted for specific use-cases/workloads support them, but we are opposed to allowing features which jeopardize one over the other, especially in the rather obvious way as we've highlighted above. In the past, and possibly in the future, we will make some choices which hurt use-case one over the other; in which case we will almost always be biased towards large, multi-user clusters. Having said that, there are some workarounds even with the CapacityScheduler ( https://issues.apache.org/jira/browse/HADOOP-5170?focusedCommentId=12726751&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12726751 ) if people are so inclined.
          Hide
          Scott Carey added a comment -

          You're right that we can't support Jonathan's use case, but it isn't a valid use case for Hadoop. Hadoop is about sharing a cluster with other users.

          Please don't assume everyone uses Hadoop the same way. Many users are more than willing to tune a specific job flow on a cluster for max compute power per $ and have little or no ad-hoc job submission. For some such a workflow is business critical and the cost/benefit of heavy hand-tuning is obvious.

          Yes, these tuning knobs are not good for all use cases and especially ad-hoc shared cluster use.

          Being able to specifically tune a business critical workflow is a very good thing for some. There is a reason this JIRA gained a lot of watchers fast and has the most votes for it.

          A resource model will be very useful, but it will never beat hand tuning for performance – it will only get part way there – and the two can be cooperative as well.
          Some clusters will be used in ways such that hand tuning is possible, others won't.

          Perhaps in the long term there could be a shared scheduler component that contains all the task limits with a default implementation, and each scheduler can optionally use, or override it.

          Show
          Scott Carey added a comment - You're right that we can't support Jonathan's use case, but it isn't a valid use case for Hadoop. Hadoop is about sharing a cluster with other users. Please don't assume everyone uses Hadoop the same way. Many users are more than willing to tune a specific job flow on a cluster for max compute power per $ and have little or no ad-hoc job submission. For some such a workflow is business critical and the cost/benefit of heavy hand-tuning is obvious. Yes, these tuning knobs are not good for all use cases and especially ad-hoc shared cluster use. Being able to specifically tune a business critical workflow is a very good thing for some. There is a reason this JIRA gained a lot of watchers fast and has the most votes for it. A resource model will be very useful, but it will never beat hand tuning for performance – it will only get part way there – and the two can be cooperative as well. Some clusters will be used in ways such that hand tuning is possible, others won't. Perhaps in the long term there could be a shared scheduler component that contains all the task limits with a default implementation, and each scheduler can optionally use, or override it.
          Hide
          Jonathan Gray added a comment -

          @Matei +1

          @Arun I need to dig further into MAPREDUCE-532 before I can make an educated assessment. I'm not that familiar with scheduling and queues; part of the reason I like this issue is the simplicity.

          Show
          Jonathan Gray added a comment - @Matei +1 @Arun I need to dig further into MAPREDUCE-532 before I can make an educated assessment. I'm not that familiar with scheduling and queues; part of the reason I like this issue is the simplicity.
          Hide
          Matei Zaharia added a comment -

          Jonathan, would you be okay if this feature was placed in the fair scheduler?

          I agree with Owen that it's a good idea to remove task limits from the Hadoop core interface so as not to lock in the API. However, I will support them in the fair scheduler. Apart from the demand for this feature from the community, we have paying customers at Cloudera who asked for this patch and run it on multi-rack clusters.

          Show
          Matei Zaharia added a comment - Jonathan, would you be okay if this feature was placed in the fair scheduler? I agree with Owen that it's a good idea to remove task limits from the Hadoop core interface so as not to lock in the API. However, I will support them in the fair scheduler. Apart from the demand for this feature from the community, we have paying customers at Cloudera who asked for this patch and run it on multi-rack clusters.
          Hide
          Arun C Murthy added a comment -

          I have thousands of jobs submitted to my cluster every day and prior to this patch I was unable to get to 25% the capacity I have now because of the characteristics of a very small minority of the total number of jobs.

          I need a way to prevent CPU intensive jobs from eating all the cores. If I have 20 nodes and 100 CPU bound jobs, i only want 1 per node at a time. That strikes me as a common use case, not an invalid one. At the same time I'd like a way to allow 10 jobs per node of network bound jobs that do little else but wait around.

          Jonathan, Can you please sketch the scenario you face in more detail?

          I'll hazard a guess and ask you to consider using high-ram jobs for your CPU intensive jobs and use MAPREDUCE-532 to limit how much of the cluster they occupy. (I'm reading that you have a small minority of such jobs from your comment).

          For e.g. create a 'cpu-heavy' queue and limit it's capacity as a fraction of your total capacity using MAPREDUCE-532. Then configure your CPU-intensive jobs to use up a majority of the slots in each node (thus preventing 2 slots of the same job from being scheduled on the same node) and submit these jobs to the 'cpu-heavy' queue.

          Yes, you will not be able to run too many other tasks on those nodes. But with MAPREDUCE-532 you can ensure that it doesn't affect too many nodes in your cluster.

          Will that work?

          Show
          Arun C Murthy added a comment - I have thousands of jobs submitted to my cluster every day and prior to this patch I was unable to get to 25% the capacity I have now because of the characteristics of a very small minority of the total number of jobs. I need a way to prevent CPU intensive jobs from eating all the cores. If I have 20 nodes and 100 CPU bound jobs, i only want 1 per node at a time. That strikes me as a common use case, not an invalid one. At the same time I'd like a way to allow 10 jobs per node of network bound jobs that do little else but wait around. Jonathan, Can you please sketch the scenario you face in more detail? I'll hazard a guess and ask you to consider using high-ram jobs for your CPU intensive jobs and use MAPREDUCE-532 to limit how much of the cluster they occupy. (I'm reading that you have a small minority of such jobs from your comment). For e.g. create a 'cpu-heavy' queue and limit it's capacity as a fraction of your total capacity using MAPREDUCE-532 . Then configure your CPU-intensive jobs to use up a majority of the slots in each node (thus preventing 2 slots of the same job from being scheduled on the same node) and submit these jobs to the 'cpu-heavy' queue. Yes, you will not be able to run too many other tasks on those nodes. But with MAPREDUCE-532 you can ensure that it doesn't affect too many nodes in your cluster. Will that work?
          Hide
          Chris K Wensel added a comment -

          Hadoop is about sharing a cluster with other users.

          I beg to differ. many of my users run many concurrent single purpose Hadoop clusters in AWS. each tuned and sized to the particular load. i believe, HOD exists for this purpose as well, to some extent.

          re this patch.

          many users, I expect the bulk of Hadoop users, have small constrained clusters and are happy with hand crafting their workloads to get the best utilization and performance. whether or not this patch is 'correct', reading above I get the impression it is being used and the users find it useful. flat out reverting it seems heavy handed without an alternative available.

          Show
          Chris K Wensel added a comment - Hadoop is about sharing a cluster with other users. I beg to differ. many of my users run many concurrent single purpose Hadoop clusters in AWS. each tuned and sized to the particular load. i believe, HOD exists for this purpose as well, to some extent. re this patch. many users, I expect the bulk of Hadoop users, have small constrained clusters and are happy with hand crafting their workloads to get the best utilization and performance. whether or not this patch is 'correct', reading above I get the impression it is being used and the users find it useful. flat out reverting it seems heavy handed without an alternative available.
          Hide
          Jonathan Gray added a comment -

          I don't see how my use case is invalid.

          And if decisions are always made for the betterment of very large clusters with multiple users rather than smaller clusters with a single user (or at least, in a controlled environment), you're going to alienate a vast majority of new users and even seasoned users who happen to not care about multi-user environments.

          I don't particularly care how I get done what I need to get done. But I have thousands of jobs submitted to my cluster every day and prior to this patch I was unable to get to 25% the capacity I have now because of the characteristics of a very small minority of the total number of jobs.

          I need a way to prevent CPU intensive jobs from eating all the cores. If I have 20 nodes and 100 CPU bound jobs, i only want 1 per node at a time. That strikes me as a common use case, not an invalid one. At the same time I'd like a way to allow 10 jobs per node of network bound jobs that do little else but wait around.

          From a boots on the ground perspective, I can say that this jira comes up more than once a week in discussion with new users, etc. It's easy to understand and gives the user an enormous amount of control with very few knobs. At least stick this functionality into a corner somewhere so those of us who want it can still use it. When there is more mature scheduling down the road, I'm more than happy to switch.

          Show
          Jonathan Gray added a comment - I don't see how my use case is invalid. And if decisions are always made for the betterment of very large clusters with multiple users rather than smaller clusters with a single user (or at least, in a controlled environment), you're going to alienate a vast majority of new users and even seasoned users who happen to not care about multi-user environments. I don't particularly care how I get done what I need to get done. But I have thousands of jobs submitted to my cluster every day and prior to this patch I was unable to get to 25% the capacity I have now because of the characteristics of a very small minority of the total number of jobs. I need a way to prevent CPU intensive jobs from eating all the cores. If I have 20 nodes and 100 CPU bound jobs, i only want 1 per node at a time. That strikes me as a common use case, not an invalid one. At the same time I'd like a way to allow 10 jobs per node of network bound jobs that do little else but wait around. From a boots on the ground perspective, I can say that this jira comes up more than once a week in discussion with new users, etc. It's easy to understand and gives the user an enormous amount of control with very few knobs. At least stick this functionality into a corner somewhere so those of us who want it can still use it. When there is more mature scheduling down the road, I'm more than happy to switch.
          Hide
          Owen O'Malley added a comment -

          I think those limits are a very bad idea. If you think they are useful, I'd be ok with putting them into the fair share scheduler. Then you can experiment with them and see if they are useful outside of a single user research cluster. I suspect you'll quickly discover the lack of utility in this patch. I certainly don't think this is appropriate to go into the main map/reduce framework.

          You're right that we can't support Jonathan's use case, but it isn't a valid use case for Hadoop. Hadoop is about sharing a cluster with other users. It is not a personal supercomputer operating system. (Although Arun and I did have a personal Hadoop supercomputer for a couple months while running the sort benchmarks. smile)

          I plan on reverting this and closing this as wont fix.

          Show
          Owen O'Malley added a comment - I think those limits are a very bad idea. If you think they are useful, I'd be ok with putting them into the fair share scheduler. Then you can experiment with them and see if they are useful outside of a single user research cluster. I suspect you'll quickly discover the lack of utility in this patch. I certainly don't think this is appropriate to go into the main map/reduce framework. You're right that we can't support Jonathan's use case, but it isn't a valid use case for Hadoop. Hadoop is about sharing a cluster with other users. It is not a personal supercomputer operating system. (Although Arun and I did have a personal Hadoop supercomputer for a couple months while running the sort benchmarks. smile ) I plan on reverting this and closing this as wont fix.
          Hide
          Arun C Murthy added a comment -

          Yes, it will only had a hard-limit per-queue. This feature is aimed at solving a certain class of problem alone i.e. limiting fan-in for a specific resource.

          s/had/add

          Show
          Arun C Murthy added a comment - Yes, it will only had a hard-limit per-queue. This feature is aimed at solving a certain class of problem alone i.e. limiting fan-in for a specific resource. s/had/add
          Hide
          Arun C Murthy added a comment -

          Arun, can you explain what the hard limit on capacity means? Is it option 2 in MAPREDUCE-532 that just disallows a queue from taking excess capacity?

          Yes, it will only had a hard-limit per-queue. This feature is aimed at solving a certain class of problem alone i.e. limiting fan-in for a specific resource.

          Show
          Arun C Murthy added a comment - Arun, can you explain what the hard limit on capacity means? Is it option 2 in MAPREDUCE-532 that just disallows a queue from taking excess capacity? Yes, it will only had a hard-limit per-queue. This feature is aimed at solving a certain class of problem alone i.e. limiting fan-in for a specific resource.
          Hide
          Matei Zaharia added a comment -

          Arun, can you explain what the hard limit on capacity means? Is it option 2 in MAPREDUCE-532 that just disallows a queue from taking excess capacity? I'm still not sure how this can solve the problems you brought up. In fact, as I pointed out in my earlier comment, even without any limits defined, MAPREDUCE-516 can harm locality and utilization in its current form. Am I missing something about per-job limits that makes them different from queue limits or from no excess capacity being available?

          Owen, I agree that having MAPREDUCE-516 be in terms of multi-slot tasks would solve some issues, but it doesn't sound like it would solve Jonathan's problem for example. I think that coming up with a general multi-resource sharing model will be difficult. Is there anything we can do in the meantime to support this use case? For example, it would be trivial for me to implement the per-node limits in the fair scheduler, but then they wouldn't be available to users of other schedulers.

          This might also be a good time to figure out exactly what scheduling functionality can go into JobInProgress/TaskTracker/etc and what should go into schedulers. I didn't think the limits in this patch added much complexity. They obey the contract of obtainNewMapTask, which is that it may or may not return a task. Schedulers already have to deal with jobs that have no tasks to launch on one heartbeat, and then have tasks on the next, because of speculation. So any scheduler that works with that should more or less be okay if the job chooses to launch a task based on its total number of running tasks. If we agreed on some kind of contract, then we would be able to implement common scheduling functionality in the mapreduce package rather than having it be contrib. Otherwise, as long as there are multiple groups working on scheduling on Hadoop, everyone will be worried that someone else's change will break their future work.

          Show
          Matei Zaharia added a comment - Arun, can you explain what the hard limit on capacity means? Is it option 2 in MAPREDUCE-532 that just disallows a queue from taking excess capacity? I'm still not sure how this can solve the problems you brought up. In fact, as I pointed out in my earlier comment, even without any limits defined, MAPREDUCE-516 can harm locality and utilization in its current form. Am I missing something about per-job limits that makes them different from queue limits or from no excess capacity being available? Owen, I agree that having MAPREDUCE-516 be in terms of multi-slot tasks would solve some issues, but it doesn't sound like it would solve Jonathan's problem for example. I think that coming up with a general multi-resource sharing model will be difficult. Is there anything we can do in the meantime to support this use case? For example, it would be trivial for me to implement the per-node limits in the fair scheduler, but then they wouldn't be available to users of other schedulers. This might also be a good time to figure out exactly what scheduling functionality can go into JobInProgress/TaskTracker/etc and what should go into schedulers. I didn't think the limits in this patch added much complexity. They obey the contract of obtainNewMapTask, which is that it may or may not return a task. Schedulers already have to deal with jobs that have no tasks to launch on one heartbeat, and then have tasks on the next, because of speculation. So any scheduler that works with that should more or less be okay if the job chooses to launch a task based on its total number of running tasks. If we agreed on some kind of contract, then we would be able to implement common scheduling functionality in the mapreduce package rather than having it be contrib. Otherwise, as long as there are multiple groups working on scheduling on Hadoop, everyone will be worried that someone else's change will break their future work.
          Show
          Arun C Murthy added a comment - @Dhruba: http://developer.yahoo.com/hadoop/distribution/
          Hide
          dhruba borthakur added a comment -

          @Arun: is the yahoo 0.20 branch (that you refer to) an existing branch in apache svn? how soon can I access it ?

          Show
          dhruba borthakur added a comment - @Arun: is the yahoo 0.20 branch (that you refer to) an existing branch in apache svn? how soon can I access it ?
          Hide
          Arun C Murthy added a comment -

          Yes, it should be ready in a day or two. We will release a patch for the yahoo 0.20 branch too.

          Show
          Arun C Murthy added a comment - Yes, it should be ready in a day or two. We will release a patch for the yahoo 0.20 branch too.
          Hide
          Tom White added a comment -

          Thanks for the clarification Arun. Are you targeting MAPREDUCE-532 for 0.21?

          Show
          Tom White added a comment - Thanks for the clarification Arun. Are you targeting MAPREDUCE-532 for 0.21?
          Hide
          Arun C Murthy added a comment -

          If the capacity scheduler will need to support cluster-wide task limits through MAPREDUCE-532 eventually, why can't it support them through this per-job property?

          Sorry, I should have clarified that we are simplifying the scope of MAPREDUCE-532 to implement a 'hard-limit' on a queue. Thus, we can limit the fan-in to the resource in question via the hard-limit on the special queue for the resource.

          Show
          Arun C Murthy added a comment - If the capacity scheduler will need to support cluster-wide task limits through MAPREDUCE-532 eventually, why can't it support them through this per-job property? Sorry, I should have clarified that we are simplifying the scope of MAPREDUCE-532 to implement a 'hard-limit' on a queue. Thus, we can limit the fan-in to the resource in question via the hard-limit on the special queue for the resource.
          Hide
          Owen O'Malley added a comment -

          This is not resolved and will likely be reverted.

          Show
          Owen O'Malley added a comment - This is not resolved and will likely be reverted.
          Hide
          Owen O'Malley added a comment -

          The problem with this patch is that it doesn't do anything useful and gets in the way of real fixes to the problem. Limiting the number of running tasks per a job is not what any users need. It is being used as an expedient way to hack in global resource limits. If I launch two cpu hungry jobs (even with a single user!), I will kill those nodes. That is not ok. While MAPREDUCE-516 addresses high-ram jobs, it takes the view that each slot has a set of resources and that a given job's tasks may require multiple slots. This is a reasonable model that is applicable for cpu, ram, or disk. It would be reasonable to change the interface to MAPREDUCE-516 to be in terms of the slots per a task instead of memory.

          Clearly in the long term, I think that a more dynamic model that tracks the resources being consumed and launches new tasks appropriately. In the mean time, MAPREDUCE-516 is a much better approach to this.

          This is an optional feature - users don't have to use it and it is turned off by default. It has no performance impact for those who don't enable it. On the other hand there are a substantial number of users who are using it (or interested in using it), and would be left with no immediate alternative if it were pulled from the next release.

          We can leave this feature in until there is a suitable replacement, after which time it can be deprecated so users can migrate to the replacement. Could that work?

          -1 Because once it is in, we need to support it. When it doesn't work for the users, they'll try and fix it. We already have users confused by too many knobs...

          Show
          Owen O'Malley added a comment - The problem with this patch is that it doesn't do anything useful and gets in the way of real fixes to the problem. Limiting the number of running tasks per a job is not what any users need. It is being used as an expedient way to hack in global resource limits. If I launch two cpu hungry jobs (even with a single user!), I will kill those nodes. That is not ok. While MAPREDUCE-516 addresses high-ram jobs, it takes the view that each slot has a set of resources and that a given job's tasks may require multiple slots. This is a reasonable model that is applicable for cpu, ram, or disk. It would be reasonable to change the interface to MAPREDUCE-516 to be in terms of the slots per a task instead of memory. Clearly in the long term, I think that a more dynamic model that tracks the resources being consumed and launches new tasks appropriately. In the mean time, MAPREDUCE-516 is a much better approach to this. This is an optional feature - users don't have to use it and it is turned off by default. It has no performance impact for those who don't enable it. On the other hand there are a substantial number of users who are using it (or interested in using it), and would be left with no immediate alternative if it were pulled from the next release. We can leave this feature in until there is a suitable replacement, after which time it can be deprecated so users can migrate to the replacement. Could that work? -1 Because once it is in, we need to support it. When it doesn't work for the users, they'll try and fix it. We already have users confused by too many knobs...
          Hide
          Tom White added a comment -

          This is an optional feature - users don't have to use it and it is turned off by default. It has no performance impact for those who don't enable it. On the other hand there are a substantial number of users who are using it (or interested in using it), and would be left with no immediate alternative if it were pulled from the next release.

          We can leave this feature in until there is a suitable replacement, after which time it can be deprecated so users can migrate to the replacement. Could that work?

          Show
          Tom White added a comment - This is an optional feature - users don't have to use it and it is turned off by default. It has no performance impact for those who don't enable it. On the other hand there are a substantial number of users who are using it (or interested in using it), and would be left with no immediate alternative if it were pulled from the next release. We can leave this feature in until there is a suitable replacement, after which time it can be deprecated so users can migrate to the replacement. Could that work?
          Hide
          Matei Zaharia added a comment -

          If the capacity scheduler will need to support cluster-wide task limits through MAPREDUCE-532 eventually, why can't it support them through this per-job property?

          I fully agree that per-node limits aren't a solution to resource sharing in a multi-user environment. However, most Hadoop clusters outside Yahoo, Facebook and a few other large installations are essentially single-user. For these clusters, the per-node limit solves real problems until the time when we introduce a resource model to Hadoop (which will be a serious undertaking). This is why there were many votes on this feature. Whenever I've talked to people about the fair scheduler, task limits have been one of the most requested features.

          Show
          Matei Zaharia added a comment - If the capacity scheduler will need to support cluster-wide task limits through MAPREDUCE-532 eventually, why can't it support them through this per-job property? I fully agree that per-node limits aren't a solution to resource sharing in a multi-user environment. However, most Hadoop clusters outside Yahoo, Facebook and a few other large installations are essentially single-user. For these clusters, the per-node limit solves real problems until the time when we introduce a resource model to Hadoop (which will be a serious undertaking). This is why there were many votes on this feature. Whenever I've talked to people about the fair scheduler, task limits have been one of the most requested features.
          Hide
          Arun C Murthy added a comment -

          If the patch interacts poorly with the capacity scheduler, wouldn't it make more sense to tell users of the capacity scheduler not to use this feature, or to improve MAPREDUCE-516 so that it can take into account task limits?

          Like I said, this patch introduced 2 features: per-node limits and per-job limits. The per-job cluster limit interacts poorly with the MAPREDUCE-516, not per-node limits.

          The situation described in the original task was that a task is very CPU-intensive. The user wanted to limit the number of those tasks running to less than the number of cores. However, there is no need to fill up all slots on the machine - the other slots could be used for less CPU-intensive tasks. Until there is a model for all system resources in the capacity scheduler, this patch lets users with that kind of problem achieve reasonable behavior.

          We are missing the woods for the trees here. The user whose task is CPU-intensive is happy. But, what about the other users whose tasks need CPU too? How do we keep their tasks from starving on the same node? In particular there are no checks and balances on preventing multiple CPU-intensive tasks from being scheduled on the same node. If we knew that the other tasks were going to be IO intensive we could co-schedule them on this node, but we don't. This is the reason why Owen and I continue to insist that per-node task limits are a poor substitute for modelling resource usage and that a resource model is a pre-requisite for this feature.

          This feature works well for clusters with single users, but not in shared clusters.

          One situation in which you want to limit tasks on the whole cluster is if you have a housekeeping job with long tasks, e.g. a database import or export. If you don't have a limit on running tasks, such a job can take up the whole cluster and hold it for a long time.

          The short-term fix is to submit these jobs to a special queue with limited capacity, possibly to queues with a hard upper-limit on their capacity: MAPREDUCE-532.

          Show
          Arun C Murthy added a comment - If the patch interacts poorly with the capacity scheduler, wouldn't it make more sense to tell users of the capacity scheduler not to use this feature, or to improve MAPREDUCE-516 so that it can take into account task limits? Like I said, this patch introduced 2 features: per-node limits and per-job limits. The per-job cluster limit interacts poorly with the MAPREDUCE-516 , not per-node limits. The situation described in the original task was that a task is very CPU-intensive. The user wanted to limit the number of those tasks running to less than the number of cores. However, there is no need to fill up all slots on the machine - the other slots could be used for less CPU-intensive tasks. Until there is a model for all system resources in the capacity scheduler, this patch lets users with that kind of problem achieve reasonable behavior. We are missing the woods for the trees here. The user whose task is CPU-intensive is happy. But, what about the other users whose tasks need CPU too? How do we keep their tasks from starving on the same node? In particular there are no checks and balances on preventing multiple CPU-intensive tasks from being scheduled on the same node. If we knew that the other tasks were going to be IO intensive we could co-schedule them on this node, but we don't. This is the reason why Owen and I continue to insist that per-node task limits are a poor substitute for modelling resource usage and that a resource model is a pre-requisite for this feature. This feature works well for clusters with single users, but not in shared clusters. One situation in which you want to limit tasks on the whole cluster is if you have a housekeeping job with long tasks, e.g. a database import or export. If you don't have a limit on running tasks, such a job can take up the whole cluster and hold it for a long time. The short-term fix is to submit these jobs to a special queue with limited capacity, possibly to queues with a hard upper-limit on their capacity: MAPREDUCE-532 .
          Hide
          Jonathan Gray added a comment -

          I am using this patch on 0.19 and 0.20, and will have to continue patching it if taken out of the 0.21 release. I've also recommended this patch and know of others who are using it... It solves a number of scheduling issues we have.

          The idea of having "heavy" tasks works great for high-memory jobs. But we have a number of network-io bound jobs and cpu bound jobs. We can make the network bound jobs light, and cpu bound jobs heavy, but what we really want is to have one cpu bound job run per node (we have one core for it), and lots of network jobs per node... simultaneously. With task weights, how can I prevent the cpu bound job from taking up all the slots and letting in high numbers of network bound jobs at the same time?

          Show
          Jonathan Gray added a comment - I am using this patch on 0.19 and 0.20, and will have to continue patching it if taken out of the 0.21 release. I've also recommended this patch and know of others who are using it... It solves a number of scheduling issues we have. The idea of having "heavy" tasks works great for high-memory jobs. But we have a number of network-io bound jobs and cpu bound jobs. We can make the network bound jobs light, and cpu bound jobs heavy, but what we really want is to have one cpu bound job run per node (we have one core for it), and lots of network jobs per node... simultaneously. With task weights, how can I prevent the cpu bound job from taking up all the slots and letting in high numbers of network bound jobs at the same time?
          Hide
          Matei Zaharia added a comment -

          I wrote this patch because of demand for this feature from many users, including Cloudera customers, who had reasons to limit the number of tasks per job. If the patch interacts poorly with the capacity scheduler, wouldn't it make more sense to tell users of the capacity scheduler not to use this feature, or to improve MAPREDUCE-516 so that it can take into account task limits?

          I agree that task limits are not the best solution for dealing with high-memory jobs, but that's not the only situation that this patch addresses. Here are some others:

          • The situation described in the original task was that a task is very CPU-intensive. The user wanted to limit the number of those tasks running to less than the number of cores. However, there is no need to fill up all slots on the machine - the other slots could be used for less CPU-intensive tasks. Until there is a model for all system resources in the capacity scheduler, this patch lets users with that kind of problem achieve reasonable behavior.
          • One situation in which you want to limit tasks on the whole cluster is if you have a housekeeping job with long tasks, e.g. a database import or export. If you don't have a limit on running tasks, such a job can take up the whole cluster and hold it for a long time. The capacity and fair schedulers include a preemption feature to kill tasks, but preemption has some problems: First, the preemption timeout generally shouldn't be set to less than 5-10 minutes so it doesn't interact badly with faulty tasktracker timeouts, and second, preemption might complicate writing certain jobs (e.g. database import/export).

          I also think the problems Arun brought up in the capacity scheduler aren't insurmountable. To deal with the locality problem, you can give up slots when you run out of local data on them, like MAPREDUCE-548. To deal with the pinning problem, preemption might help.

          More importantly, I think Arun's issues will come up even if task limits aren't used. For example, suppose that a given queue's capacity is 25% of the cluster, say 250 slots, and that all the other queues are filled with jobs, so there is never excess capacity. Then if a high-memory job is submitted to the queue with 25% capacity, it has to behave exactly as if it has a limit of 250 tasks cluster-wide. The same problems that Arun brought up will happen: locality will be poor, the job might make bad bets, etc. This situation will be far more common than a user explicitly setting a limit on running tasks for the job, so if these two concerns are serious, maybe more work needs to be put into MAPREDUCE-516 to prevent them from happening. In particular, I think limiting the number of "bets" based on the job's capacity or task limit, allowing the job to kill tasks if it has waited for too long on a bet, and implementing MAPREDUCE-548 would solve much of the problem.

          Show
          Matei Zaharia added a comment - I wrote this patch because of demand for this feature from many users, including Cloudera customers, who had reasons to limit the number of tasks per job. If the patch interacts poorly with the capacity scheduler, wouldn't it make more sense to tell users of the capacity scheduler not to use this feature, or to improve MAPREDUCE-516 so that it can take into account task limits? I agree that task limits are not the best solution for dealing with high-memory jobs, but that's not the only situation that this patch addresses. Here are some others: The situation described in the original task was that a task is very CPU-intensive. The user wanted to limit the number of those tasks running to less than the number of cores. However, there is no need to fill up all slots on the machine - the other slots could be used for less CPU-intensive tasks. Until there is a model for all system resources in the capacity scheduler, this patch lets users with that kind of problem achieve reasonable behavior. One situation in which you want to limit tasks on the whole cluster is if you have a housekeeping job with long tasks, e.g. a database import or export. If you don't have a limit on running tasks, such a job can take up the whole cluster and hold it for a long time. The capacity and fair schedulers include a preemption feature to kill tasks, but preemption has some problems: First, the preemption timeout generally shouldn't be set to less than 5-10 minutes so it doesn't interact badly with faulty tasktracker timeouts, and second, preemption might complicate writing certain jobs (e.g. database import/export). I also think the problems Arun brought up in the capacity scheduler aren't insurmountable. To deal with the locality problem, you can give up slots when you run out of local data on them, like MAPREDUCE-548 . To deal with the pinning problem, preemption might help. More importantly, I think Arun's issues will come up even if task limits aren't used. For example, suppose that a given queue's capacity is 25% of the cluster, say 250 slots, and that all the other queues are filled with jobs, so there is never excess capacity. Then if a high-memory job is submitted to the queue with 25% capacity, it has to behave exactly as if it has a limit of 250 tasks cluster-wide. The same problems that Arun brought up will happen: locality will be poor, the job might make bad bets, etc. This situation will be far more common than a user explicitly setting a limit on running tasks for the job, so if these two concerns are serious, maybe more work needs to be put into MAPREDUCE-516 to prevent them from happening. In particular, I think limiting the number of "bets" based on the job's capacity or task limit, allowing the job to kill tasks if it has waited for too long on a bet, and implementing MAPREDUCE-548 would solve much of the problem.
          Hide
          Arun C Murthy added a comment -

          I'm sorry I'm coming in late on this.

          Ditto.

          Are these new knobs at the job level? I think this the wrong direction. In particular, just limiting the number of slots won't do any good. The high ram job processing is a much better model. So rather than declaring max number of maps or reduces, we should allow "large task" jobs where each task is given multiple slots.

          Couldn't agree more. See MAPREDUCE-516.

          I see the following issues:

          mapred.max.

          {maps|reduces}.per.node

          As of today, the framework will not be able to guard against users who take up a node and consume all resources (cpu, memory, disk etc.) and starve other user tasks running on the machines. This goes against the spirit of shared compute/storage clusters. I can see an argument being made for this feature once we can figure how to charge the user based on the task's total resource consumption; however we are a long way away from this. We have taken a step along this direction by introducing the High RAM Jobs feature in the Capacity Scheduler, we have a long way to go.

          mapred.max.{maps|reduces}

          .per.cluster

          Given that we mostly agree that the high-ram jobs are the right model, these features interact very badly with each other.

          Consider a high-ram job which has mapred.max.maps.per.cluster set to 100 and a few thousand tasks (a fraction of which is sufficient to exhaust the capacity of it's queue).
          Currently the CapacityScheduler starts 'reserving' slots (after charging the user, queue etc.) to satisfy the resource requirements of the job.

          Now we have 2 choices if we choose to incorporate mapred.max.

          {map|reduces}

          .per.cluster:

          1. Do not 'reserve' more tasktrackers than mapred.max.maps.per.cluster
          2. Reserve as much as needed but start 'unreserving' as soon as we have scheduled 'mapred.max.maps.per.cluster' number of maps.

          Either way we are in trouble.

          1. The high-ram job is seriously starved. The scheduler has to pick only 100 nodes and no more. If they happened to be bad bets (other long running tasks etc.) the high-ram job needs to wait for a long while. Furthermore, the other bad side-effect is that the high-ram job gets pinned to the first 100 nodes and this really hurts locality of its tasks.
          2. The cluster is severely under-utilized since we may reserve all of the queue's capacity and then realize that we have to start 'unreserving'. Once the first wave of maps of the high-ram job are done we rinse and repeat for each wave of mapred.max.maps.per.cluster maps, there-by keeping the cluster idle for large amounts of time.

          I think we should revert this before it is released.

          Unfortunately, yes. +1

          Show
          Arun C Murthy added a comment - I'm sorry I'm coming in late on this. Ditto. Are these new knobs at the job level? I think this the wrong direction. In particular, just limiting the number of slots won't do any good. The high ram job processing is a much better model. So rather than declaring max number of maps or reduces, we should allow "large task" jobs where each task is given multiple slots. Couldn't agree more. See MAPREDUCE-516 . I see the following issues: mapred.max. {maps|reduces}.per.node As of today, the framework will not be able to guard against users who take up a node and consume all resources (cpu, memory, disk etc.) and starve other user tasks running on the machines. This goes against the spirit of shared compute/storage clusters. I can see an argument being made for this feature once we can figure how to charge the user based on the task's total resource consumption; however we are a long way away from this. We have taken a step along this direction by introducing the High RAM Jobs feature in the Capacity Scheduler, we have a long way to go. mapred.max.{maps|reduces} .per.cluster Given that we mostly agree that the high-ram jobs are the right model, these features interact very badly with each other. Consider a high-ram job which has mapred.max.maps.per.cluster set to 100 and a few thousand tasks (a fraction of which is sufficient to exhaust the capacity of it's queue). Currently the CapacityScheduler starts 'reserving' slots (after charging the user, queue etc.) to satisfy the resource requirements of the job. Now we have 2 choices if we choose to incorporate mapred.max. {map|reduces} .per.cluster: Do not 'reserve' more tasktrackers than mapred.max.maps.per.cluster Reserve as much as needed but start 'unreserving' as soon as we have scheduled 'mapred.max.maps.per.cluster' number of maps. Either way we are in trouble. The high-ram job is seriously starved. The scheduler has to pick only 100 nodes and no more. If they happened to be bad bets (other long running tasks etc.) the high-ram job needs to wait for a long while. Furthermore, the other bad side-effect is that the high-ram job gets pinned to the first 100 nodes and this really hurts locality of its tasks. The cluster is severely under-utilized since we may reserve all of the queue's capacity and then realize that we have to start 'unreserving'. Once the first wave of maps of the high-ram job are done we rinse and repeat for each wave of mapred.max.maps.per.cluster maps, there-by keeping the cluster idle for large amounts of time. I think we should revert this before it is released. Unfortunately, yes. +1
          Hide
          Owen O'Malley added a comment -

          I'm sorry I'm coming in late on this. Are these new knobs at the job level? I think this the wrong direction. In particular, just limiting the number of slots won't do any good. The high ram job processing is a much better model. So rather than declaring max number of maps or reduces, we should allow "large task" jobs where each task is given multiple slots.I think we should revert this before it is released.

          Show
          Owen O'Malley added a comment - I'm sorry I'm coming in late on this. Are these new knobs at the job level? I think this the wrong direction. In particular, just limiting the number of slots won't do any good. The high ram job processing is a much better model. So rather than declaring max number of maps or reduces, we should allow "large task" jobs where each task is given multiple slots.I think we should revert this before it is released.
          Hide
          rahul k singh added a comment -

          patch for yahoo internal repo

          Show
          rahul k singh added a comment - patch for yahoo internal repo
          Hide
          Matei Zaharia added a comment -

          For those interested in using this in older branches, Todd and Jonathan's patches should work. Jonathan's 0.19 patch should also work in the 0.20 branch, I believe.

          Show
          Matei Zaharia added a comment - For those interested in using this in older branches, Todd and Jonathan's patches should work. Jonathan's 0.19 patch should also work in the 0.20 branch, I believe.
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Matei!
          This cannot be committed to the older branches since it's a new feature (as opposed to a regression).

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Matei! This cannot be committed to the older branches since it's a new feature (as opposed to a regression).
          Hide
          Matei Zaharia added a comment -

          The release audit says the following:

          437d436
          <      [java]  !????? /home/hudson/hudson-slave/workspace/Hadoop-Patch-vesta.apache.org/trunk/build/hadoop-781115_HADOOP-5170_PATCH-12409578/src/mapred/mapred-default.xml.orig
          

          What does this mean? Is it just complaining that I modified mapred-default.xml?

          The failing contrib test seems to be unrelated (HADOOP-5869).

          Show
          Matei Zaharia added a comment - The release audit says the following: 437d436 < [java] !????? /home/hudson/hudson-slave/workspace/Hadoop-Patch-vesta.apache.org/trunk/build/hadoop-781115_HADOOP-5170_PATCH-12409578/src/mapred/mapred- default .xml.orig What does this mean? Is it just complaining that I modified mapred-default.xml? The failing contrib test seems to be unrelated ( HADOOP-5869 ).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12409578/tasklimits-v4.patch
          against trunk revision 781115.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          -1 release audit. The applied patch generated 493 release audit warnings (more than the trunk's current 492 warnings).

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/451/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/451/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/451/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/451/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/451/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12409578/tasklimits-v4.patch against trunk revision 781115. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 release audit. The applied patch generated 493 release audit warnings (more than the trunk's current 492 warnings). +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/451/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/451/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/451/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/451/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/451/console This message is automatically generated.
          Hide
          Matei Zaharia added a comment -

          Also, I haven't been able to figure out what the release audit warning is (if it was valid) because the link posted by Hudson doesn't work. Let's see if it comes up again. I don't see any Java warnings generated by my patch so I'm not sure what it could be.

          Show
          Matei Zaharia added a comment - Also, I haven't been able to figure out what the release audit warning is (if it was valid) because the link posted by Hudson doesn't work. Let's see if it comes up again. I don't see any Java warnings generated by my patch so I'm not sure what it could be.
          Hide
          Matei Zaharia added a comment -

          Here's a new patch that refactors the limit checking logic into a single method. I've also removed the test for limits being larger than slot counts because that is tested in other mapred tests (where we do not have any task limits set).

          Show
          Matei Zaharia added a comment - Here's a new patch that refactors the limit checking logic into a single method. I've also removed the test for limits being larger than slot counts because that is tested in other mapred tests (where we do not have any task limits set).
          Hide
          Matei Zaharia added a comment -

          I had to make the waits big because the inter-heartbeat interval is big. I can remove some of the test cases though (having just the first 2 might be enough). I'd really rather not dive into modifying test harnesses as part of this patch, but I think that would be a great feature to add to MiniMRCluster in a separate JIRA.

          Show
          Matei Zaharia added a comment - I had to make the waits big because the inter-heartbeat interval is big. I can remove some of the test cases though (having just the first 2 might be enough). I'd really rather not dive into modifying test harnesses as part of this patch, but I think that would be a great feature to add to MiniMRCluster in a separate JIRA.
          Hide
          Devaraj Das added a comment -

          I have a minor nit - the code in JobInProgress.findNewMapTask/findNewReduceTask that this patch adds is very similar and probably can be factored out to a separate method with the appropriate args.
          Other than that, in the testcase, there are big waits (and the testcase takes ~3 minutes to run). Are they required to be so long. Also, in general, we should move to the model of spoofing heartbeats (and faking other objects) in such testcases but I won't hold this patch up for that (unless there is enthusiasm to modify the test in that direction).

          Show
          Devaraj Das added a comment - I have a minor nit - the code in JobInProgress.findNewMapTask/findNewReduceTask that this patch adds is very similar and probably can be factored out to a separate method with the appropriate args. Other than that, in the testcase, there are big waits (and the testcase takes ~3 minutes to run). Are they required to be so long. Also, in general, we should move to the model of spoofing heartbeats (and faking other objects) in such testcases but I won't hold this patch up for that (unless there is enthusiasm to modify the test in that direction).
          Hide
          Todd Lipcon added a comment -

          Attached is a backport of this patch against 0.18.3 including the new unit tests, which pass. There are some backports here of new test code in UtilsForTests, etc, required for the tests to run.

          Show
          Todd Lipcon added a comment - Attached is a backport of this patch against 0.18.3 including the new unit tests, which pass. There are some backports here of new test code in UtilsForTests, etc, required for the tests to run.
          Hide
          Todd Lipcon added a comment -

          Sorry, false alarm. Looks like TestJobHistory is currently flaky (HADOOP-5920)

          Show
          Todd Lipcon added a comment - Sorry, false alarm. Looks like TestJobHistory is currently flaky ( HADOOP-5920 )
          Hide
          Todd Lipcon added a comment -

          It looks like this broke the TestJobHistory test. I checked out trunk and it passed, then applied this patch and it failed.

          Show
          Todd Lipcon added a comment - It looks like this broke the TestJobHistory test. I checked out trunk and it passed, then applied this patch and it failed.
          Hide
          Jonathan Gray added a comment -

          I'm still on 0.19 and really need this patch.

          Attached is a patch that applies cleanly to current 0.19 branch. Note: This removes the unit test! I tried to get the test to work in 0.19 but eventually turned to doing manual testing on my cluster with my jobs.

          All my testing shows that the patch works as advertised.

          I'm moving this into a busy production system and will report back if there are any issues.

          Show
          Jonathan Gray added a comment - I'm still on 0.19 and really need this patch. Attached is a patch that applies cleanly to current 0.19 branch. Note: This removes the unit test! I tried to get the test to work in 0.19 but eventually turned to doing manual testing on my cluster with my jobs. All my testing shows that the patch works as advertised. I'm moving this into a busy production system and will report back if there are any issues.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12408747/tasklimits-v3.patch
          against trunk revision 777761.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          -1 release audit. The applied patch generated 492 release audit warnings (more than the trunk's current 491 warnings).

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/392/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/392/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/392/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/392/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/392/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408747/tasklimits-v3.patch against trunk revision 777761. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 release audit. The applied patch generated 492 release audit warnings (more than the trunk's current 491 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/392/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/392/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/392/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/392/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/392/console This message is automatically generated.
          Hide
          Matei Zaharia added a comment -

          Here's a new patch with unit tests. I used MiniMRCluster because it's hard to test obtainNewMapTask/ReduceTask directly without duplicating a lot of the code in MiniMRCluster (setting up a JobTracker and a TaskTracker).

          Show
          Matei Zaharia added a comment - Here's a new patch with unit tests. I used MiniMRCluster because it's hard to test obtainNewMapTask/ReduceTask directly without duplicating a lot of the code in MiniMRCluster (setting up a JobTracker and a TaskTracker).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12408326/tasklimits-v2.patch
          against trunk revision 776904.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          -1 release audit. The applied patch generated 492 release audit warnings (more than the trunk's current 491 warnings).

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/367/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/367/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/367/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/367/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/367/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408326/tasklimits-v2.patch against trunk revision 776904. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 release audit. The applied patch generated 492 release audit warnings (more than the trunk's current 491 warnings). +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/367/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/367/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/367/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/367/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/367/console This message is automatically generated.
          Hide
          Tom White added a comment -

          +1

          The documentation changes will go to http://hadoop.apache.org/core/docs/current/mapred-default.html at the next release.

          Regarding unit testing, can you unit test JobInProgress's obtainNewMapTask() and obtainNewReduceTask() directly?

          Show
          Tom White added a comment - +1 The documentation changes will go to http://hadoop.apache.org/core/docs/current/mapred-default.html at the next release. Regarding unit testing, can you unit test JobInProgress's obtainNewMapTask() and obtainNewReduceTask() directly?
          Hide
          Matei Zaharia added a comment -

          Here's a new patch with a few changes suggested by Tom White:

          • Changed the default values from Integer.MAX_VALUE to -1 (which is made to signify no limit)
          • Added documentation for the new parameters in src/mapred/mapred-default.xml. Is this the right file for the documentation to propagate to the website?
          • Changed check for per-node limit to only iterate through running tasks if there is a limit set.

          I also renamed the "per cluster" limit parameters to mapred.running.map.limit and mapred.running.reduce.limit, which I are clearer names (you only have one cluster, so per cluster sounds strange).

          Show
          Matei Zaharia added a comment - Here's a new patch with a few changes suggested by Tom White: Changed the default values from Integer.MAX_VALUE to -1 (which is made to signify no limit) Added documentation for the new parameters in src/mapred/mapred-default.xml. Is this the right file for the documentation to propagate to the website? Changed check for per-node limit to only iterate through running tasks if there is a limit set. I also renamed the "per cluster" limit parameters to mapred.running.map.limit and mapred.running.reduce.limit, which I are clearer names (you only have one cluster, so per cluster sounds strange).
          Hide
          Jean-Daniel Cryans added a comment -

          bq It will solve many users' problems in the short term, as evidenced by the number of votes and watchers.

          Indeed, I know about many shops that use their cluster for one job at a time and that need that kind of sweet feature.

          Show
          Jean-Daniel Cryans added a comment - bq It will solve many users' problems in the short term, as evidenced by the number of votes and watchers. Indeed, I know about many shops that use their cluster for one job at a time and that need that kind of sweet feature.
          Hide
          Matei Zaharia added a comment -

          Hemanth, if you submit another job that is also CPU-bound, it may interfere with the first. However, if you submit one that is IO-bound, it will be fine. This task limit feature isn't meant to solve the general resource allocation problem, only to give you a way to limit resource consumption if you know that you have one job with very resource-intensive tasks and many jobs with less resource-intensive tasks. Because it's such a simple feature, I think it's a good one to add before building any kind of automatic resource-aware scheduling. It will solve many users' problems in the short term, as evidenced by the number of votes and watchers.

          Show
          Matei Zaharia added a comment - Hemanth, if you submit another job that is also CPU-bound, it may interfere with the first. However, if you submit one that is IO-bound, it will be fine. This task limit feature isn't meant to solve the general resource allocation problem, only to give you a way to limit resource consumption if you know that you have one job with very resource-intensive tasks and many jobs with less resource-intensive tasks. Because it's such a simple feature, I think it's a good one to add before building any kind of automatic resource-aware scheduling. It will solve many users' problems in the short term, as evidenced by the number of votes and watchers.
          Hide
          Hemanth Yamijala added a comment -

          Matei, wanted to understand this a bit more..

          Another use case that I heard of is the following: If I have a CPU bound job, I may want to restrict the number of tasks that run on a node, so I can use the cores all for myself. So, if I set mapred.max.maps.per.node for this job appropriately, the patch makes sure that only those many tasks of this job are scheduled on the node, even if slots are free.

          However, another job that has no limits set can get scheduled on the free slots. Then, the exclusiveness is not met, right ? So, is the above use case not handled by this patch, or can it be accomplished in some way still.

          Show
          Hemanth Yamijala added a comment - Matei, wanted to understand this a bit more.. Another use case that I heard of is the following: If I have a CPU bound job, I may want to restrict the number of tasks that run on a node, so I can use the cores all for myself. So, if I set mapred.max.maps.per.node for this job appropriately, the patch makes sure that only those many tasks of this job are scheduled on the node, even if slots are free. However, another job that has no limits set can get scheduled on the free slots. Then, the exclusiveness is not met, right ? So, is the above use case not handled by this patch, or can it be accomplished in some way still.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12407054/tasklimits.patch
          against trunk revision 770685.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/282/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/282/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/282/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/282/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407054/tasklimits.patch against trunk revision 770685. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/282/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/282/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/282/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/282/console This message is automatically generated.
          Hide
          Jonathan Gray added a comment -

          I'm not intimately familiar with this area of the codebase, but I filed the issue.

          The approach seems simple and straightforward to me, +1. Patch looks good. I had the impression this would be a bit harder to do than that.

          These four settings should fulfill all the different use cases i've heard for this issue.

          I will test the latest patch on monday.

          Thanks Matei!

          Show
          Jonathan Gray added a comment - I'm not intimately familiar with this area of the codebase, but I filed the issue. The approach seems simple and straightforward to me, +1. Patch looks good. I had the impression this would be a bit harder to do than that. These four settings should fulfill all the different use cases i've heard for this issue. I will test the latest patch on monday. Thanks Matei!
          Hide
          Matei Zaharia added a comment -

          Here is a start at a patch for this issue. I added limits on running maps and reduces in the form of four parameters:

          • mapred.max.maps.per.cluster
          • mapred.max.reduces.per.cluster
          • mapred.max.maps.per.node
          • mapred.max.reduces.per.node

          All the limits start at infinity by default (meaning no limit other than the number of slots on the node, as happens today).

          These limits are located in JobInProgress and affect whether obtainNewMapTask and obtainNewReduceTask succeed. They will therefore work with any job scheduler (default FIFO scheduler, fair scheduler or capacity scheduler). For example, setting the per-cluster limit for a job under the FIFO scheduler will mean that this job will consume a certain number of slots (even if it has more tasks than this number of slots), and the other slots can be used by later jobs in the queue.

          Let me know whether this approach looks good and whether the names for the parameters make sense. I can then maybe move the parameter strings into JobConf methods so they don't appear right in JobInProgress.

          Show
          Matei Zaharia added a comment - Here is a start at a patch for this issue. I added limits on running maps and reduces in the form of four parameters: mapred.max.maps.per.cluster mapred.max.reduces.per.cluster mapred.max.maps.per.node mapred.max.reduces.per.node All the limits start at infinity by default (meaning no limit other than the number of slots on the node, as happens today). These limits are located in JobInProgress and affect whether obtainNewMapTask and obtainNewReduceTask succeed. They will therefore work with any job scheduler (default FIFO scheduler, fair scheduler or capacity scheduler). For example, setting the per-cluster limit for a job under the FIFO scheduler will mean that this job will consume a certain number of slots (even if it has more tasks than this number of slots), and the other slots can be used by later jobs in the queue. Let me know whether this approach looks good and whether the names for the parameters make sense. I can then maybe move the parameter strings into JobConf methods so they don't appear right in JobInProgress.
          Hide
          Nathan Marz added a comment -

          My use case is a job with a custom output format which utilizes local disk heavily. Jobs that should take half an hour with one task per node are taking 4 hours due to disk contention. There's no feature I want more in Hadoop than this one.

          Show
          Nathan Marz added a comment - My use case is a job with a custom output format which utilizes local disk heavily. Jobs that should take half an hour with one task per node are taking 4 hours due to disk contention. There's no feature I want more in Hadoop than this one.
          Hide
          ryan rawson added a comment -

          I'm also interested in this - we want to import from a database using map-reduce, but it'd be nice to tune on a per-job basis.

          Show
          ryan rawson added a comment - I'm also interested in this - we want to import from a database using map-reduce, but it'd be nice to tune on a per-job basis.
          Hide
          Jonathan Gray added a comment -

          +1 on Chris' thoughts above.

          Ideally we'd like to have as much control as possible (per job, max concurrent tasks across cluster and/or per node). Either of these would satisfy my requirements, so if one fits more easily into the existing scheduler/jobtracker, I think we should go after that approach.

          The approach of Vinod does not help in my base use case because I want to run avg 1 cpu-bound task per node (cluster max = nodes) but my network latency bound jobs I'd like to run 5 or more per node (cluster max = 5 * nodes). We are using threading in some places but it's significant complexity in many already complex MapReduce jobs.

          Show
          Jonathan Gray added a comment - +1 on Chris' thoughts above. Ideally we'd like to have as much control as possible (per job, max concurrent tasks across cluster and/or per node). Either of these would satisfy my requirements, so if one fits more easily into the existing scheduler/jobtracker, I think we should go after that approach. The approach of Vinod does not help in my base use case because I want to run avg 1 cpu-bound task per node (cluster max = nodes) but my network latency bound jobs I'd like to run 5 or more per node (cluster max = 5 * nodes). We are using threading in some places but it's significant complexity in many already complex MapReduce jobs.
          Hide
          Chris K Wensel added a comment -

          Yes, a limit on the number of concurrently running tasks, cluster wide, specified by and enforced against given job would be great.

          But also having the option to set the number of concurrently running tasks, node wide, for a give job is also important.

          Nathan thinks this might help with his issue as well on HADOOP-5160.

          So a job with a million tasks, only 100 can run concurrently in the cluster, and only 1 can run per node.

          Show
          Chris K Wensel added a comment - Yes, a limit on the number of concurrently running tasks, cluster wide, specified by and enforced against given job would be great. But also having the option to set the number of concurrently running tasks, node wide, for a give job is also important. Nathan thinks this might help with his issue as well on HADOOP-5160 . So a job with a million tasks, only 100 can run concurrently in the cluster, and only 1 can run per node.
          Hide
          dhruba borthakur added a comment -

          This jira is probably asking for a feature that allows limiting the number of concurrently running tasks of a job. A job could have a million tasks, but if we can somehow tell the system to not schedule more than 100 tasks of this job concurrently, then your problem is solved, isn't it?

          Show
          dhruba borthakur added a comment - This jira is probably asking for a feature that allows limiting the number of concurrently running tasks of a job. A job could have a million tasks, but if we can somehow tell the system to not schedule more than 100 tasks of this job concurrently, then your problem is solved, isn't it?
          Hide
          Chris K Wensel added a comment -

          It looks like mapred.jobtracker.maxtasks.per.job is a global value. Would be interesting to have each job optionally override this value for itself.

          Show
          Chris K Wensel added a comment - It looks like mapred.jobtracker.maxtasks.per.job is a global value. Would be interesting to have each job optionally override this value for itself.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          There is already a cluster-wide limit mapred.jobtracker.maxtasks.per.job putting a limit on the number of tasks a job can have. This is mainly used for limiting JT's memory(HADOOP-4018), but it should serve your purpose too.

          As for limiting the number of tasks of job on each node, the same was being talked about at HADOOP-4295, you may want to see the discussion there.

          Show
          Vinod Kumar Vavilapalli added a comment - There is already a cluster-wide limit mapred.jobtracker.maxtasks.per.job putting a limit on the number of tasks a job can have. This is mainly used for limiting JT's memory( HADOOP-4018 ), but it should serve your purpose too. As for limiting the number of tasks of job on each node, the same was being talked about at HADOOP-4295 , you may want to see the discussion there.

            People

            • Assignee:
              Matei Zaharia
              Reporter:
              Jonathan Gray
            • Votes:
              9 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development