Hadoop Common
  1. Hadoop Common
  2. HADOOP-4768

Dynamic Priority Scheduler that allows queue shares to be controlled dynamically by a currency

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      New contribution Dynamic Scheduler implements dynamic priorities with a currency model. Usage instructions are in the Jira item.

      Description

      Dynamic (economic) priority scheduler based on work presented at the Hadoop User Group meeting in Santa Clara in September and the HadoopCamp in New Orleans in November 2008.

      1. HADOOP-4768-6.patch
        95 kB
        steve_l
      2. HADOOP-4768-5.patch
        99 kB
        steve_l
      3. HADOOP-4768-4.patch
        94 kB
        Thomas Sandholm
      4. HADOOP-4768-3.patch
        92 kB
        steve_l
      5. HADOOP-4768-2.patch
        93 kB
        Thomas Sandholm
      6. HADOOP-4768.patch
        37 kB
        Thomas Sandholm
      7. HADOOP-4768-fairshare.patch
        2 kB
        Thomas Sandholm
      8. HADOOP-4768-capacity-scheduler.patch
        2 kB
        Thomas Sandholm
      9. HADOOP-4768-dynamic-scheduler.patch
        29 kB
        Thomas Sandholm

        Issue Links

          Activity

          Hide
          Thomas Sandholm added a comment -

          Dynamic priority scheduler and test classes
          Capacity scheduler patch to use dynamic capacity guarantees
          Fairshare scheduler patch to use dynamic pool shares

          Show
          Thomas Sandholm added a comment - Dynamic priority scheduler and test classes Capacity scheduler patch to use dynamic capacity guarantees Fairshare scheduler patch to use dynamic pool shares
          Hide
          Matei Zaharia added a comment -

          Hi Thomas,

          I haven't had a chance to look at your code in detail, but here are some quick thoughts.

          First of all, this sounds like an interesting way to be able to allocate shares, so the main question is how to implement it. In that regard, I'm not sure that adding a "meta-scheduler" is the cleanest approach, because it changes the way the existing schedulers are invoked (which probably doesn't matter using the current scheduler API but may mean worrying about what to do if there's a meta-scheduler between us and the jobtracker later) and couples the implementation of these schedulers tightly with the dynamic priority scheduler (if the TaskScheduler API has some methods added to it, and we want to use it in the fair or capacity schedulers, we have to modify the dynamic priority scheduler too). Instead, I'm wondering whether it's possible to implement your functionality in a different manner: have the dynamic priority scheduler be an external process which modifies the config file of the capacity or fair schedulers, so that the latter don't have to know anything about it at all. This seems to be essentially how you communicate with them anyway. The other advantage of this is that if the schedulers change exactly how they compute allocations, they don't need to worry about what to do with the dynamic share file vs the scheduler config file - that is, there isn't that dependency where the scheduler has to look at two config files. Would this be a reasonable approach?

          I also have a suggestion about your implementation for the fair scheduler. Currently you are modifying the mapAllocs/reduceAllocs, but I'd like to point out that these are actually guaranteed shares (minimum shares), and not the fair shares used for distributing excess capacity. The difference between these is that the scheduler has flexibility in when it gives a job slots towards its fair share, but it must meet the guarantee at all times. In a little bit, the patch at HADOOP-4667 will let the fair scheduler use this flexibility to increase locality for jobs that are at their guaranteed share but not at their minimum share by letting them wait for local slots, which will improve performance. In other words, the min share is an absolute guarantee, but the fair share is something you'll get on average but gives the scheduler more wiggle room to improve performance. So depending on your goal - do you want strict guarantees or fuzzy ones - it would be good to consider setting the pools' fair shares rather than their allocations. In the current trunk version of the fair scheduler this is not possible, but HADOOP-4789 adds a "weight" parameter that lets you do this. You may even charge people differently for fair shares vs guaranteed shares.

          Just out of curiosity, what is the use case for which you've designed this scheduler, is it something you require at HP? (In my work as a CS grad student I'm interested in what requirements people have for MapReduce schedulers and I'd like to hear about other peoples' use cases of shared Hadoop clusters.)

          Show
          Matei Zaharia added a comment - Hi Thomas, I haven't had a chance to look at your code in detail, but here are some quick thoughts. First of all, this sounds like an interesting way to be able to allocate shares, so the main question is how to implement it. In that regard, I'm not sure that adding a "meta-scheduler" is the cleanest approach, because it changes the way the existing schedulers are invoked (which probably doesn't matter using the current scheduler API but may mean worrying about what to do if there's a meta-scheduler between us and the jobtracker later) and couples the implementation of these schedulers tightly with the dynamic priority scheduler (if the TaskScheduler API has some methods added to it, and we want to use it in the fair or capacity schedulers, we have to modify the dynamic priority scheduler too). Instead, I'm wondering whether it's possible to implement your functionality in a different manner: have the dynamic priority scheduler be an external process which modifies the config file of the capacity or fair schedulers, so that the latter don't have to know anything about it at all. This seems to be essentially how you communicate with them anyway. The other advantage of this is that if the schedulers change exactly how they compute allocations, they don't need to worry about what to do with the dynamic share file vs the scheduler config file - that is, there isn't that dependency where the scheduler has to look at two config files. Would this be a reasonable approach? I also have a suggestion about your implementation for the fair scheduler. Currently you are modifying the mapAllocs/reduceAllocs, but I'd like to point out that these are actually guaranteed shares (minimum shares), and not the fair shares used for distributing excess capacity. The difference between these is that the scheduler has flexibility in when it gives a job slots towards its fair share, but it must meet the guarantee at all times. In a little bit, the patch at HADOOP-4667 will let the fair scheduler use this flexibility to increase locality for jobs that are at their guaranteed share but not at their minimum share by letting them wait for local slots, which will improve performance. In other words, the min share is an absolute guarantee, but the fair share is something you'll get on average but gives the scheduler more wiggle room to improve performance. So depending on your goal - do you want strict guarantees or fuzzy ones - it would be good to consider setting the pools' fair shares rather than their allocations. In the current trunk version of the fair scheduler this is not possible, but HADOOP-4789 adds a "weight" parameter that lets you do this. You may even charge people differently for fair shares vs guaranteed shares. Just out of curiosity, what is the use case for which you've designed this scheduler, is it something you require at HP? (In my work as a CS grad student I'm interested in what requirements people have for MapReduce schedulers and I'd like to hear about other peoples' use cases of shared Hadoop clusters.)
          Hide
          Thomas Sandholm added a comment -

          Hi Matei,

          when I was implementing this I played around with a number of different approaches. The goals were to make the dynamic scheduler as independent of the underlying schedulers as possible, and to require as little changes as possible to them. I didn't want to add a seperate deployed service as this introduces another point of failure and maintenanance. So I hooked into the schedulers regular event/bookkeeping loop without even requiring a seperate thread to be spawn instead. In terms of updating the config files of the schedulers (pools file and capaity-scheduler) asynchronously without requiring any changes at all to the schedulers, it is something I also tried, and it was quite simple for the fairshare scheduler but it introduces a dependency on the xml format if you don't want to do some xpath like replacement (whih turned out to be both too complex and too slow for our purpose). Updating the config files would lead to more I/O overhead too, now the shares are communicated directly to the shedulers in memory. I don't think the reverse dependeny is too bad either, the scheduler just get a list of queue/share values from a config property and can then utilize those in whatever way makes sense to the local scheduler. My patches to the capacity scheduler and the fairshare scheduler should rather be seen as examples for scheduler developers how to utilize the dynamic scheduler rather than final solutions.

          The important thing is that the dynamic scheduler allows control over and accounts for budget spent on different levels of quality of service/priority. This QoS/priority can then be enforced and implemented in any number of ways, the dynamic scheduler doesn't care, as long as spending more currency per time unit will give you better performance.

          Thanks for the more detailed info on the fairshare scheduler, I still think that the guaranteed allocations were the best match, but if it makes sense to pay more currency for higher fair-shares you could enforce the shares granted by the dynamic scheduler in a more sophisticated way. I don't think the interface between the schedulers has to change for this to be done though.

          One use case is that you could hook this feature into a secure banking system where budgets can be transferred from the user to the cluster owner automatically. We have used this approach successfully in a system called Tycoon (http://tycoon.hpl.hp.com) but instead of allocating map/reduce task slots it allocates virtual machine shares using Xen (like EC2 but with variable pricing and finer grained resource control).

          Another use case is a cloud computing test bed that we are designing together with Intel and Yahoo (that I presented at the venues mentioned in the patch description). In this scenario researchers are granted some quota, e.g. based on their contribution to the testbed. The quota can then be used by them to obtain resources when they need them and at a QoS level that matches their needs.

          Hope this clarifies things a bit. If you want more info on the big picture you can look at some of the papers and presentations on the tycoon site mentioned above or the test bed site, www.opencirrus.org (under construction).

          Show
          Thomas Sandholm added a comment - Hi Matei, when I was implementing this I played around with a number of different approaches. The goals were to make the dynamic scheduler as independent of the underlying schedulers as possible, and to require as little changes as possible to them. I didn't want to add a seperate deployed service as this introduces another point of failure and maintenanance. So I hooked into the schedulers regular event/bookkeeping loop without even requiring a seperate thread to be spawn instead. In terms of updating the config files of the schedulers (pools file and capaity-scheduler) asynchronously without requiring any changes at all to the schedulers, it is something I also tried, and it was quite simple for the fairshare scheduler but it introduces a dependency on the xml format if you don't want to do some xpath like replacement (whih turned out to be both too complex and too slow for our purpose). Updating the config files would lead to more I/O overhead too, now the shares are communicated directly to the shedulers in memory. I don't think the reverse dependeny is too bad either, the scheduler just get a list of queue/share values from a config property and can then utilize those in whatever way makes sense to the local scheduler. My patches to the capacity scheduler and the fairshare scheduler should rather be seen as examples for scheduler developers how to utilize the dynamic scheduler rather than final solutions. The important thing is that the dynamic scheduler allows control over and accounts for budget spent on different levels of quality of service/priority. This QoS/priority can then be enforced and implemented in any number of ways, the dynamic scheduler doesn't care, as long as spending more currency per time unit will give you better performance. Thanks for the more detailed info on the fairshare scheduler, I still think that the guaranteed allocations were the best match, but if it makes sense to pay more currency for higher fair-shares you could enforce the shares granted by the dynamic scheduler in a more sophisticated way. I don't think the interface between the schedulers has to change for this to be done though. One use case is that you could hook this feature into a secure banking system where budgets can be transferred from the user to the cluster owner automatically. We have used this approach successfully in a system called Tycoon ( http://tycoon.hpl.hp.com ) but instead of allocating map/reduce task slots it allocates virtual machine shares using Xen (like EC2 but with variable pricing and finer grained resource control). Another use case is a cloud computing test bed that we are designing together with Intel and Yahoo (that I presented at the venues mentioned in the patch description). In this scenario researchers are granted some quota, e.g. based on their contribution to the testbed. The quota can then be used by them to obtain resources when they need them and at a QoS level that matches their needs. Hope this clarifies things a bit. If you want more info on the big picture you can look at some of the papers and presentations on the tycoon site mentioned above or the test bed site, www.opencirrus.org (under construction).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12395248/HADOOP-4768-fairshare.patch
          against trunk revision 724459.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3690/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3690/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3690/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3690/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12395248/HADOOP-4768-fairshare.patch against trunk revision 724459. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3690/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3690/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3690/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3690/console This message is automatically generated.
          Hide
          Thomas Sandholm added a comment -

          all changes rolled into one patch + new capacity and fairshare scheduler tests

          Show
          Thomas Sandholm added a comment - all changes rolled into one patch + new capacity and fairshare scheduler tests
          Hide
          Thomas Sandholm added a comment -

          new patch available

          Show
          Thomas Sandholm added a comment - new patch available
          Hide
          Thomas Sandholm added a comment -

          new single patch with all changes

          Show
          Thomas Sandholm added a comment - new single patch with all changes
          Hide
          Matei Zaharia added a comment -

          Thanks for the clarifications, Thomas, and sorry for the late reply (I'm currently at a conference and have little time for email). I'm still somewhat uncomfortable with modifying scheduler source because of the kind of precedent it sets - people are allowed to change the internal logic of several existing schedulers to support a meta-scheduler. I imagine the Capacity Scheduler folks might have the same concern. Changing logic in an existing scheduler is okay if you are trying to improve the way it handles its original goal, but more dangerous if it might hinder it in how it can change to meet this goal in the future (or cause it to have to break the way it interacts with you). This is why I would really like if things went through the "supported API" for the fair and capacity scheduler, which at this point is the config file. If there are problems with that not being reloaded frequently enough, it might be better to improve that aspect of the schedulers. This is obviously just my feeling though and I'd appreciate input from other people interested in scheduling.

          As an example of the API point, note that both the fair and capacity schedulers were planned in theory to have some extension points where you can plug in classes to do certain things. For the fair scheduler, there is a way to add a "weight adjuster" which can set job weights; I wouldn't mind providing a similar thing for pool weights if needed. I'm not sure whether such extension points exist in the capacity scheduler yet but I know they were part of the plan. A related point is that it may be easier to support just one of the two schedulers if this is worthwhile. It's somewhat unfortunate that there are two multi-user schedulers right now but they do have different philosophies and goals as far as I can tell from talking with Owen and Arun (the capacity scheduler is more focused on hard guarantees and the fair scheduler on flexibility, though these are converging somewhat).

          One other thing I'd like to understand is how often you plan to be changing allocations, and why this would cause any kind of performance degradation with using config files (if they were reloaded fast enough, etc). I can't imagine xpath being slow or IO being a problem if you have a reasonable number of users, unless you are really changing the file a few times per second. Perhaps a process external to Hadoop is not the right solution but even a metascheduler that does not modify the APIs might work.

          Show
          Matei Zaharia added a comment - Thanks for the clarifications, Thomas, and sorry for the late reply (I'm currently at a conference and have little time for email). I'm still somewhat uncomfortable with modifying scheduler source because of the kind of precedent it sets - people are allowed to change the internal logic of several existing schedulers to support a meta-scheduler. I imagine the Capacity Scheduler folks might have the same concern. Changing logic in an existing scheduler is okay if you are trying to improve the way it handles its original goal, but more dangerous if it might hinder it in how it can change to meet this goal in the future (or cause it to have to break the way it interacts with you). This is why I would really like if things went through the "supported API" for the fair and capacity scheduler, which at this point is the config file. If there are problems with that not being reloaded frequently enough, it might be better to improve that aspect of the schedulers. This is obviously just my feeling though and I'd appreciate input from other people interested in scheduling. As an example of the API point, note that both the fair and capacity schedulers were planned in theory to have some extension points where you can plug in classes to do certain things. For the fair scheduler, there is a way to add a "weight adjuster" which can set job weights; I wouldn't mind providing a similar thing for pool weights if needed. I'm not sure whether such extension points exist in the capacity scheduler yet but I know they were part of the plan. A related point is that it may be easier to support just one of the two schedulers if this is worthwhile. It's somewhat unfortunate that there are two multi-user schedulers right now but they do have different philosophies and goals as far as I can tell from talking with Owen and Arun (the capacity scheduler is more focused on hard guarantees and the fair scheduler on flexibility, though these are converging somewhat). One other thing I'd like to understand is how often you plan to be changing allocations, and why this would cause any kind of performance degradation with using config files (if they were reloaded fast enough, etc). I can't imagine xpath being slow or IO being a problem if you have a reasonable number of users, unless you are really changing the file a few times per second. Perhaps a process external to Hadoop is not the right solution but even a metascheduler that does not modify the APIs might work.
          Hide
          Vivek Ratan added a comment -

          Agree with Matei. Thomas, you really should be creating your own separate scheduler, rather than building a meta-scheduler, for various reasons:

          • Schedulers interact with the JT through the TaskScheduler class. This interface provides the contract between the JT and various schedulers (see HADOOP-3412). Having something else sit between the JT and various schedulers violates this contract.
          • We haven't really considered a 'meta scheduler' - what it is, what it does, whether it's needed. You're welcome to start a Jira on this issue. A meta-scheduler would, presumably, affect all schedulers, so what is it that you'd like it to do? Is there some functionality that might be better suited for the JT, perhaps?
          • Most importantly, the functionality you desire (affecting how priorities are changed for jobs) may not be something that all schedulers want. In fact, it is not something we want to currently do for the Capacity Scheduler. The requirements for the CapacityScheduler, as listed in HADOOP-3421, allow users to change job priorities anytime, and we have certain guarantees in terms of what happens when priorities of running or waiting jobs are changed. As things stand, your patch will cause the CapacityScheduler to behave differently from what it's supposed to do.

          You can also file separate Jiras for individually modifying the functionality of the Capacity and Fairshare schedulers (as Matei correctly points out, they do have different requirements and philosophies), but I'd recommend that you provide some use cases and examples of why the existing functionality is not good enough or what exactly is missing. With respect to the Capacity Scheduler, we have had a few discussions on resource budgets and penalizing/rewarding users based on their prior resource consumptions, but we currently don't have any concrete proposals.

          On a related note, you may be wanting to re-use much of the implementation of the Capacity or Fairshare schedulers. This issue has come up before and we would like to make it easy to share code, but it's a separate issue with a separate discussion (I can't remember if we have a Jira filed for it).

          Show
          Vivek Ratan added a comment - Agree with Matei. Thomas, you really should be creating your own separate scheduler, rather than building a meta-scheduler, for various reasons: Schedulers interact with the JT through the TaskScheduler class. This interface provides the contract between the JT and various schedulers (see HADOOP-3412 ). Having something else sit between the JT and various schedulers violates this contract. We haven't really considered a 'meta scheduler' - what it is, what it does, whether it's needed. You're welcome to start a Jira on this issue. A meta-scheduler would, presumably, affect all schedulers, so what is it that you'd like it to do? Is there some functionality that might be better suited for the JT, perhaps? Most importantly, the functionality you desire (affecting how priorities are changed for jobs) may not be something that all schedulers want. In fact, it is not something we want to currently do for the Capacity Scheduler. The requirements for the CapacityScheduler, as listed in HADOOP-3421 , allow users to change job priorities anytime, and we have certain guarantees in terms of what happens when priorities of running or waiting jobs are changed. As things stand, your patch will cause the CapacityScheduler to behave differently from what it's supposed to do. You can also file separate Jiras for individually modifying the functionality of the Capacity and Fairshare schedulers (as Matei correctly points out, they do have different requirements and philosophies), but I'd recommend that you provide some use cases and examples of why the existing functionality is not good enough or what exactly is missing. With respect to the Capacity Scheduler, we have had a few discussions on resource budgets and penalizing/rewarding users based on their prior resource consumptions, but we currently don't have any concrete proposals. On a related note, you may be wanting to re-use much of the implementation of the Capacity or Fairshare schedulers. This issue has come up before and we would like to make it easy to share code, but it's a separate issue with a separate discussion (I can't remember if we have a Jira filed for it).
          Hide
          Vivek Ratan added a comment -

          Looking at your patch, it seems like the only change you make for the Capacity Scheduler is to have a separate config field called mapred.scheduler.shares which seems to contain the guaranteed capacities for various queues, and that you read this field often. We do plan to support updating the config values of various queues in the Capacity Scheduler, including guaranteed capacities, as and when required. See HADOOP-4522. If this Jira is implemented, do you see any other changes to the Capacity Scheduler?

          On a separate note, I wanted to add to an earlier comment I made. Resource budgets, and how they affect resource allocation to a MR job, are an interesting discussion. You might want to consider it a little more generally, rather than just affecting priorities. Are there general ways in which we can specify resource constraints on jobs/tasks (priorities, memory, CPU, etc), ala resource managers like Torque? How do we detect what resources a job/task consumes? How do we penalize jobs (presumably, you may penalize a user differently for submitting too many high priority jobs than for submitting too many high-memory jobs; or maybe not)? Can you plug in different penalizing policies? One of the assumptions the CapacitY Scheduler makes is that users within a queue/Org are cooperative, so if someone is submitting too many high priority jobs and hogging up queue resources, peer pressure may be a good way to control this, though this may not work well for all situations.

          Show
          Vivek Ratan added a comment - Looking at your patch, it seems like the only change you make for the Capacity Scheduler is to have a separate config field called mapred.scheduler.shares which seems to contain the guaranteed capacities for various queues, and that you read this field often. We do plan to support updating the config values of various queues in the Capacity Scheduler, including guaranteed capacities, as and when required. See HADOOP-4522 . If this Jira is implemented, do you see any other changes to the Capacity Scheduler? On a separate note, I wanted to add to an earlier comment I made. Resource budgets, and how they affect resource allocation to a MR job, are an interesting discussion. You might want to consider it a little more generally, rather than just affecting priorities. Are there general ways in which we can specify resource constraints on jobs/tasks (priorities, memory, CPU, etc), ala resource managers like Torque? How do we detect what resources a job/task consumes? How do we penalize jobs (presumably, you may penalize a user differently for submitting too many high priority jobs than for submitting too many high-memory jobs; or maybe not)? Can you plug in different penalizing policies? One of the assumptions the CapacitY Scheduler makes is that users within a queue/Org are cooperative, so if someone is submitting too many high priority jobs and hogging up queue resources, peer pressure may be a good way to control this, though this may not work well for all situations.
          Hide
          Thomas Sandholm added a comment - - edited

          Thanks for your input,

          consumable quotas, and budget accounting is a requirement that we have that is not supported by any of the schedulers today. It allows users themselves to change regulated priorities that are valid in a competitive multi-user setting (where social peer pressure assumptions break down). The idea here is that demand varies over time as do user job priority preferences. When demand is high you would want to encourage only the most important jobs to be run and give users with low priority jobs an incentive to hold off on submitting their jobs. Also note that the priorities a user sets that do not affect her in any way tend to be very different from the priorities she would have to pay for in some way. Having access to a user's 'truthful' priorities allows the scheduler to do a more accurate job in efficiently mapping users to available resources while taking current demand into account.

          Back to the implementation approach. As I mentioned above, one approach I evaluated was to have a separate process that pushes the necessary changes to the config files. The fact that the capacity scheduler currently doesn't support dynamic updates of the config file is a minor issue in this context and I actually also used a patch that fixed this. The more important showstopper for this approach was the need to replicate the whole reliable hadoop service infrastructure. We have implemented our own systems and services to do much more involved budget accounting than this but contributing that whole package to Hadoop would be too much work and all of it may not be useful to the Hadoop community in general. So what I tried to do in this patch was to extract the most important pieces from our previous work that solves the above mentioned problems using as much of the existing hadoop infrastructure as possible.

          Therefore, ideally we would like to plug in some code in the scheduler event loop that allows us to set priorities (that have been paid for and that are being accounted for towards a budget). Implementing our own scheduler altogether was an option but we are not so interested in and don't have the low-level experise in how the priorities should be enforced in the map/reduce context. Hence, it seemed natural to reuse the fairshare or capacity scheduler for this purpose. If we assume that we have a scheduler-collocated budget algorithm it seems very roundabout and difficult to support multiple priority enforcers if we need to handle all the different configuration file formats of the individual schedulers. Fiddling around with xpath will also add a configuration and parsing complexity apart from limiting performance. A better solution in my opinion would be to have a way for the plugin to communicate and update in priorities directly to the scheduler within the given scheduler framework. The only interface in the current code base I found that could be used for this purpose was Confiuration properties. This in-memory approach also has the advantage that shedulers can implement more sophisticated enforcement of shares paid for by users as both Vivek and Matei alluded to above.

          To summarize, my requirements for the scheduling framework are as follows:
          -Scheduler independent plug point in the job tracker event loop to host budget accounting algorithm and to communicate paid-for shares to resource-share enforcers such as the existing two schedulers
          -Scheduler independent interface to communicate paid-for shares to resource share-enforcers (this could still be 'standardized' xml config files if you find that appropriate but it has the performance and complexity implications I mentioned above)

          The patch i submitted may not solve these problems in the absolute optimal way because i didn't want to change any interfaces in core or the scheduler framework itself. It represents my understanding of the simplest way to address these issues with the current interfaces though, and it is a first attempt to contribute our work to the hadoop community. Our falback is to just pick one scheduler and modify the config file from within our system, but we would not contribute anything to the community then and we would be left with a brittle interface to a specific scheduler.

          I will also talk through these issues with Owen and Arun when I meet them on Thursday and report back here.

          Show
          Thomas Sandholm added a comment - - edited Thanks for your input, consumable quotas, and budget accounting is a requirement that we have that is not supported by any of the schedulers today. It allows users themselves to change regulated priorities that are valid in a competitive multi-user setting (where social peer pressure assumptions break down). The idea here is that demand varies over time as do user job priority preferences. When demand is high you would want to encourage only the most important jobs to be run and give users with low priority jobs an incentive to hold off on submitting their jobs. Also note that the priorities a user sets that do not affect her in any way tend to be very different from the priorities she would have to pay for in some way. Having access to a user's 'truthful' priorities allows the scheduler to do a more accurate job in efficiently mapping users to available resources while taking current demand into account. Back to the implementation approach. As I mentioned above, one approach I evaluated was to have a separate process that pushes the necessary changes to the config files. The fact that the capacity scheduler currently doesn't support dynamic updates of the config file is a minor issue in this context and I actually also used a patch that fixed this. The more important showstopper for this approach was the need to replicate the whole reliable hadoop service infrastructure. We have implemented our own systems and services to do much more involved budget accounting than this but contributing that whole package to Hadoop would be too much work and all of it may not be useful to the Hadoop community in general. So what I tried to do in this patch was to extract the most important pieces from our previous work that solves the above mentioned problems using as much of the existing hadoop infrastructure as possible. Therefore, ideally we would like to plug in some code in the scheduler event loop that allows us to set priorities (that have been paid for and that are being accounted for towards a budget). Implementing our own scheduler altogether was an option but we are not so interested in and don't have the low-level experise in how the priorities should be enforced in the map/reduce context. Hence, it seemed natural to reuse the fairshare or capacity scheduler for this purpose. If we assume that we have a scheduler-collocated budget algorithm it seems very roundabout and difficult to support multiple priority enforcers if we need to handle all the different configuration file formats of the individual schedulers. Fiddling around with xpath will also add a configuration and parsing complexity apart from limiting performance. A better solution in my opinion would be to have a way for the plugin to communicate and update in priorities directly to the scheduler within the given scheduler framework. The only interface in the current code base I found that could be used for this purpose was Confiuration properties. This in-memory approach also has the advantage that shedulers can implement more sophisticated enforcement of shares paid for by users as both Vivek and Matei alluded to above. To summarize, my requirements for the scheduling framework are as follows: -Scheduler independent plug point in the job tracker event loop to host budget accounting algorithm and to communicate paid-for shares to resource-share enforcers such as the existing two schedulers -Scheduler independent interface to communicate paid-for shares to resource share-enforcers (this could still be 'standardized' xml config files if you find that appropriate but it has the performance and complexity implications I mentioned above) The patch i submitted may not solve these problems in the absolute optimal way because i didn't want to change any interfaces in core or the scheduler framework itself. It represents my understanding of the simplest way to address these issues with the current interfaces though, and it is a first attempt to contribute our work to the hadoop community. Our falback is to just pick one scheduler and modify the config file from within our system, but we would not contribute anything to the community then and we would be left with a brittle interface to a specific scheduler. I will also talk through these issues with Owen and Arun when I meet them on Thursday and report back here.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12395619/HADOOP-4768.patch
          against trunk revision 725603.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 13 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3708/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3708/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3708/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3708/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12395619/HADOOP-4768.patch against trunk revision 725603. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 13 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3708/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3708/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3708/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3708/console This message is automatically generated.
          Hide
          Thomas Sandholm added a comment -

          After talking to Owen and Arun about this patch, and how it would fit best in the hadoop framework there was a concern about having some process that modified the configuration files automatically. It could potentially render the whole service unusable and the cluster operators would have problems with that. Instead the current plan of action is as follows:

          1. Open a new JIRA in core/mapred to add a new interface to the scheduling framework along the lines of:
          interface QueueAllocation

          { public float getAllocation(String queue); }


          2. Provide a default implementation that reads it from the mapred configuration file
          3. Provide an implementation that implements the budget allocation algorithm described above
          4. Provide an example usage patch for one of the schedulers (probably fair-share because its code base is smaller)

          Show
          Thomas Sandholm added a comment - After talking to Owen and Arun about this patch, and how it would fit best in the hadoop framework there was a concern about having some process that modified the configuration files automatically. It could potentially render the whole service unusable and the cluster operators would have problems with that. Instead the current plan of action is as follows: 1. Open a new JIRA in core/mapred to add a new interface to the scheduling framework along the lines of: interface QueueAllocation { public float getAllocation(String queue); } 2. Provide a default implementation that reads it from the mapred configuration file 3. Provide an implementation that implements the budget allocation algorithm described above 4. Provide an example usage patch for one of the schedulers (probably fair-share because its code base is smaller)
          Hide
          Johan Oskarsson added a comment -

          Canceling patch to remove it from the queue, due to the concerns mentioned above.

          Show
          Johan Oskarsson added a comment - Canceling patch to remove it from the queue, due to the concerns mentioned above.
          Hide
          Thomas Sandholm added a comment - - edited

          Fixed up scheduler to be standalone and not rely on or change capacity or fairshare schedulers. Implemented interfaces recommended by Owen and Arun but kept them in contrib for now to avoid changing core classes. A new design doc is also available at: http://tycoon.hpl.hp.com/~sandholm/DynamicPriorityHadoop.pdf

          Show
          Thomas Sandholm added a comment - - edited Fixed up scheduler to be standalone and not rely on or change capacity or fairshare schedulers. Implemented interfaces recommended by Owen and Arun but kept them in contrib for now to avoid changing core classes. A new design doc is also available at: http://tycoon.hpl.hp.com/~sandholm/DynamicPriorityHadoop.pdf
          Hide
          Thomas Sandholm added a comment -

          latest version tested with 0.21.0 trunk

          Show
          Thomas Sandholm added a comment - latest version tested with 0.21.0 trunk
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12404862/HADOOP-4768-2.patch
          against trunk revision 762987.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/161/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/161/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/161/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/161/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12404862/HADOOP-4768-2.patch against trunk revision 762987. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/161/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/161/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/161/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/161/console This message is automatically generated.
          Hide
          steve_l added a comment -

          Thomas,

          isDynamic is protected, non-volatile, so it could be be set by a subclass while an update was taking place. do you think that is an issue? Should there be a protected setDynamic(boolean) method instead?

          Show
          steve_l added a comment - Thomas, isDynamic is protected, non-volatile, so it could be be set by a subclass while an update was taking place. do you think that is an issue? Should there be a protected setDynamic(boolean) method instead?
          Hide
          steve_l added a comment -

          Thomas, I've had a look at this

          -general: check the line endings stop at the hadoop recommended length,
          have the right spaces and indentation

          -could you have the mapred conf strings defined as constants in a single interface or class (with static imports)?

          -do we have to have everything in the org.apache.hadoop.mapred package? I know that the scheduler
          needs to be there, but it would be cleaner if we had a dynamic package for everything other than
          the scheduler to go. Of course, it it depends on the access rights of whichever mapred classes
          get passed around

          -QueueAllocation: should the fields be private?

          -FileAllocationStore.save should always close() the output stream
          If there is an exception saving, the filename should be printed as well
          as the exception. Same for the load; the close() should be in a finally clause

          -tests should use assertEquals() for better errors
          This is critical for those
          that use floating point numbers, as they should include an
          allowed range for the values

          -TEST_DIR should be set up in the test setUp(),
          in case test runners set system properties on a
          test-by-test basis.

          -Lots of commonality in the test cases -could that be
          factored out into a base class for lower maintenance?

          -removeQueues() operations should be in the teardown so that
          they run even if the tests fail (though you may need an empty test to see what happens if you try to remove a queue that was never created)

          -Consider adding tests for the PriorityScheduler comparators
          -consider test for the PriorityScheduler authorize logic

          Show
          steve_l added a comment - Thomas, I've had a look at this -general: check the line endings stop at the hadoop recommended length, have the right spaces and indentation -could you have the mapred conf strings defined as constants in a single interface or class (with static imports)? -do we have to have everything in the org.apache.hadoop.mapred package? I know that the scheduler needs to be there, but it would be cleaner if we had a dynamic package for everything other than the scheduler to go. Of course, it it depends on the access rights of whichever mapred classes get passed around -QueueAllocation: should the fields be private? -FileAllocationStore.save should always close() the output stream If there is an exception saving, the filename should be printed as well as the exception. Same for the load; the close() should be in a finally clause -tests should use assertEquals() for better errors This is critical for those that use floating point numbers, as they should include an allowed range for the values -TEST_DIR should be set up in the test setUp(), in case test runners set system properties on a test-by-test basis. -Lots of commonality in the test cases -could that be factored out into a base class for lower maintenance? -removeQueues() operations should be in the teardown so that they run even if the tests fail (though you may need an empty test to see what happens if you try to remove a queue that was never created) -Consider adding tests for the PriorityScheduler comparators -consider test for the PriorityScheduler authorize logic
          Hide
          steve_l added a comment -

          cancel to resubmit

          Show
          steve_l added a comment - cancel to resubmit
          Hide
          steve_l added a comment -

          Patch with some of my suggestions partially applied

          Show
          steve_l added a comment - Patch with some of my suggestions partially applied
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12407998/HADOOP-4768-3.patch
          against trunk revision 774433.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/335/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/335/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/335/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/335/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407998/HADOOP-4768-3.patch against trunk revision 774433. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/335/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/335/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/335/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/335/console This message is automatically generated.
          Hide
          Thomas Sandholm added a comment -

          According to Steve's comments
          -fixed test cases
          -fixed formatting issues
          -fixed FileAllocationStore issues
          -fixed QueueAllocation issue

          Show
          Thomas Sandholm added a comment - According to Steve's comments -fixed test cases -fixed formatting issues -fixed FileAllocationStore issues -fixed QueueAllocation issue
          Hide
          steve_l added a comment -

          Thomas,
          I've pulled this down and am running the tests. Given this is a contribution with some new features and tests, would anyone have objections to me committing it to trunk?

          Show
          steve_l added a comment - Thomas, I've pulled this down and am running the tests. Given this is a contribution with some new features and tests, would anyone have objections to me committing it to trunk?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12408274/HADOOP-4768-4.patch
          against trunk revision 776508.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/361/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/361/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/361/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/361/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408274/HADOOP-4768-4.patch against trunk revision 776508. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/361/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/361/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/361/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/361/console This message is automatically generated.
          Hide
          Thomas Sandholm added a comment -

          None of the test failures look related to my patch as far as I can tell.

          Show
          Thomas Sandholm added a comment - None of the test failures look related to my patch as far as I can tell.
          Hide
          steve_l added a comment -

          updated patch
          -took all the IDEs suggestions on board
          -factored out the base class for the tests, much less duplication
          -swapped the order of the equality assertions (it doesn't matter for the tests, but it does for the reporting)
          -moved the new config options into a class listing them all, to eliminate duplication and typos

          This patch was generated on a mac, the diff file includes lots of +eol-style native. I'm not sure if those are good or bad, and whether I have to do more .svn configuration here. Suggestions?

          Show
          steve_l added a comment - updated patch -took all the IDEs suggestions on board -factored out the base class for the tests, much less duplication -swapped the order of the equality assertions (it doesn't matter for the tests, but it does for the reporting) -moved the new config options into a class listing them all, to eliminate duplication and typos This patch was generated on a mac, the diff file includes lots of +eol-style native. I'm not sure if those are good or bad, and whether I have to do more .svn configuration here. Suggestions?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12408819/HADOOP-4768-5.patch
          against trunk revision 778388.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 17 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/400/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/400/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/400/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/400/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408819/HADOOP-4768-5.patch against trunk revision 778388. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 17 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/400/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/400/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/400/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/400/console This message is automatically generated.
          Hide
          steve_l added a comment -

          This is the previous patch with the line ending property stripped

          Show
          steve_l added a comment - This is the previous patch with the line ending property stripped
          Hide
          steve_l added a comment -

          If hudson is happy with this, I will commit it.

          Show
          steve_l added a comment - If hudson is happy with this, I will commit it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12409811/HADOOP-4768-6.patch
          against trunk revision 782083.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 13 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/470/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/470/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/470/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/470/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12409811/HADOOP-4768-6.patch against trunk revision 782083. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 13 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/470/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/470/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/470/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/470/console This message is automatically generated.
          Hide
          steve_l added a comment -

          failing test is in HDFS proxy. This has to be unrelated

          java.lang.NullPointerException
          	at org.apache.commons.cli.GnuParser.flatten(GnuParser.java:110)
          	at org.apache.commons.cli.Parser.parse(Parser.java:143)
          	at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:374)
          	at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
          	at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138)
          	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1314)
          	at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:414)
          	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:278)
          	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:119)
          	at org.apache.hadoop.hdfsproxy.TestHdfsProxy.testHdfsProxyInterface(TestHdfsProxy.java:209)
          
          Show
          steve_l added a comment - failing test is in HDFS proxy. This has to be unrelated java.lang.NullPointerException at org.apache.commons.cli.GnuParser.flatten(GnuParser.java:110) at org.apache.commons.cli.Parser.parse(Parser.java:143) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:374) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153) at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1314) at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:414) at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:278) at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:119) at org.apache.hadoop.hdfsproxy.TestHdfsProxy.testHdfsProxyInterface(TestHdfsProxy.java:209)
          Hide
          Thomas Sandholm added a comment -

          yes this test failure is present in all hudson builds now, so it should be unrelated to this patch.

          Show
          Thomas Sandholm added a comment - yes this test failure is present in all hudson builds now, so it should be unrelated to this patch.
          Hide
          steve_l added a comment -

          OK, I'm patching it as we speak. Thomas, if this breaks things, you get to buy the developers beer at the hadoop summit.

          Show
          steve_l added a comment - OK, I'm patching it as we speak. Thomas, if this breaks things, you get to buy the developers beer at the hadoop summit.
          Hide
          steve_l added a comment -

          Committed. Thomas, I don't think this goes into the distributions, so the root changes/release notes haven't been updated.

          Show
          steve_l added a comment - Committed. Thomas, I don't think this goes into the distributions, so the root changes/release notes haven't been updated.
          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21.

          Usage instructions:

          Overview
          --------
          The purpose of this scheduler is to allow users to increase and decrease
          their queue priorities continuosly to meet the requirements of their
          current workloads. The scheduler is aware of the current demand and makes
          it more expensive to boost the priority under peak usage times. Thus
          users who move their workload to low usage times are rewarded with
          discounts. Priorities can only be boosted within a limited quota.
          All users are given a quota or a budget which is deducted periodically
          in configurable accounting intervals. How much of the budget is
          deducted is determined by a per-user spending rate, which may
          be modified at any time directly by the user. The cluster slots
          share allocated to a particular user is computed as that users
          spending rate over the sum of all spending rates in the same accounting
          period.

          Configuration
          -------------
          This scheduler comprises two components, an accounting or resource allocation part that
          manages and bills for queue shares, and a scheduler that
          enforces the queue shares in the form of map and reduce slots of running jobs.

          Hadoop Configuration (e.g. hadoop-site.xml):
          mapred.jobtracker.taskScheduler
          This needs to be set to
          org.apache.hadoop.mapred.Editorial pass over all release notes prior to publication of 0.21.

          to use the dynamic scheduler.
          Scheduler Configuration:
          mapred.dynamic-scheduler.scheduler
          The Java path of the MapReduce scheduler that should
          enforce the allocated shares.
          Has been tested with (which is the default):
          org.apache.hadoop.mapred.PriorityScheduler
          mapred.priority-scheduler.acl-file
          Full path of ACL with syntax:
          <user> <role> <secret key>
          separated by line feeds
          mapred.dynamic-scheduler.budget-file
          The full OS path of the file from which the
          budgets are read and stored. The syntax of this file is:
          <queue name> <budget> <spending rate>
          separated by newlines where budget can be specified
          as a Java float. The file should not be edited
          directly, if the server is running, but through the
          servlet API to ensure proper synchronization.

          mapred.dynamic-scheduler.alloc-interval
          Allocation interval, when the scheduler rereads the
          spending rates and recalculates the cluster shares.
          Specified as seconds between allocations.
          Default is 20 seconds.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21. Usage instructions: Overview -------- The purpose of this scheduler is to allow users to increase and decrease their queue priorities continuosly to meet the requirements of their current workloads. The scheduler is aware of the current demand and makes it more expensive to boost the priority under peak usage times. Thus users who move their workload to low usage times are rewarded with discounts. Priorities can only be boosted within a limited quota. All users are given a quota or a budget which is deducted periodically in configurable accounting intervals. How much of the budget is deducted is determined by a per-user spending rate, which may be modified at any time directly by the user. The cluster slots share allocated to a particular user is computed as that users spending rate over the sum of all spending rates in the same accounting period. Configuration ------------- This scheduler comprises two components, an accounting or resource allocation part that manages and bills for queue shares, and a scheduler that enforces the queue shares in the form of map and reduce slots of running jobs. Hadoop Configuration (e.g. hadoop-site.xml): mapred.jobtracker.taskScheduler This needs to be set to org.apache.hadoop.mapred.Editorial pass over all release notes prior to publication of 0.21. to use the dynamic scheduler. Scheduler Configuration: mapred.dynamic-scheduler.scheduler The Java path of the MapReduce scheduler that should enforce the allocated shares. Has been tested with (which is the default): org.apache.hadoop.mapred.PriorityScheduler mapred.priority-scheduler.acl-file Full path of ACL with syntax: <user> <role> <secret key> separated by line feeds mapred.dynamic-scheduler.budget-file The full OS path of the file from which the budgets are read and stored. The syntax of this file is: <queue name> <budget> <spending rate> separated by newlines where budget can be specified as a Java float. The file should not be edited directly, if the server is running, but through the servlet API to ensure proper synchronization. mapred.dynamic-scheduler.alloc-interval Allocation interval, when the scheduler rereads the spending rates and recalculates the cluster shares. Specified as seconds between allocations. Default is 20 seconds.

            People

            • Assignee:
              Thomas Sandholm
              Reporter:
              Thomas Sandholm
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development