Hadoop YARN
  1. Hadoop YARN
  2. YARN-397 RM Scheduler api enhancements
  3. YARN-392

Make it possible to specify hard locality constraints in resource requests

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0-beta
    • Fix Version/s: 2.1.0-beta
    • Component/s: resourcemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently its not possible to specify scheduling requests for specific nodes and nowhere else. The RM automatically relaxes locality to rack and * and assigns non-specified machines to the app.

      1. YARN-392.patch
        24 kB
        Sandy Ryza
      2. YARN-392-1.patch
        5 kB
        Sandy Ryza
      3. YARN-392-2.patch
        12 kB
        Sandy Ryza
      4. YARN-392-2.patch
        12 kB
        Sandy Ryza
      5. YARN-392-2.patch
        12 kB
        Sandy Ryza
      6. YARN-392-3.patch
        18 kB
        Sandy Ryza
      7. YARN-392-4.patch
        19 kB
        Sandy Ryza
      8. YARN-392-5.patch
        19 kB
        Sandy Ryza
      9. YARN-392-6.patch
        19 kB
        Sandy Ryza
      10. YARN-392-7.patch
        20 kB
        Sandy Ryza
      11. YARN-392-8.patch
        20 kB
        Sandy Ryza

        Issue Links

          Activity

          Hide
          Sandy Ryza added a comment -

          Bikas, if you haven't started work on this, I'd be interested in taking a crack at it.

          Show
          Sandy Ryza added a comment - Bikas, if you haven't started work on this, I'd be interested in taking a crack at it.
          Hide
          Bikas Saha added a comment -

          Since you have already gone ahead and assigned it to yourself, why dont you take a shot.

          If YARN-393 gets solved then my guess is that this jira might get automatically resolved. However, YARN-393 is even more complex and potentially hard to fix.

          It would be good if you can post your ideas/approach first instead of a patch. This jira might be tricky and so discussing alternatives and agreeing on one of them would be a good exercise. I will try to post an alternative to this jira too.

          Show
          Bikas Saha added a comment - Since you have already gone ahead and assigned it to yourself, why dont you take a shot. If YARN-393 gets solved then my guess is that this jira might get automatically resolved. However, YARN-393 is even more complex and potentially hard to fix. It would be good if you can post your ideas/approach first instead of a patch. This jira might be tricky and so discussing alternatives and agreeing on one of them would be a good exercise. I will try to post an alternative to this jira too.
          Hide
          Sandy Ryza added a comment -

          Agreed that we should discuss before committing to an approach. The first approach that comes to my mind would be to have a boolean flag with each resource request that says whether or not it's ok to schedule it somewhere else. An issue with this is that it would be impossible to make a request like "I want a container to be only on one of these two nodes".

          Also agreed that this and YARN-393 are closely tied together, so I'd be happy to take on both if that ends up making the most sense.

          Show
          Sandy Ryza added a comment - Agreed that we should discuss before committing to an approach. The first approach that comes to my mind would be to have a boolean flag with each resource request that says whether or not it's ok to schedule it somewhere else. An issue with this is that it would be impossible to make a request like "I want a container to be only on one of these two nodes". Also agreed that this and YARN-393 are closely tied together, so I'd be happy to take on both if that ends up making the most sense.
          Hide
          Sandy Ryza added a comment -

          I've thought about this further a little. The alternative that occurs to me would be to have the option to associate an group ID with a resource request. Under the current model, when a container is assigned, requests are decremented "up", i.e. if it's a node-local container, the requests for the corresponding rack-local container and * are decremented. This would remain the same, unless the assigned container has a group ID, in which case all other requests with that group ID would be decremented instead.

          In some ways, this resembles the task-centric approach proposed in YARN-371, but it avoids most of the performance implications by allowing resource-centric scheduling for applications like mapreduce that don't have these special needs. The disadvantage of this approach would be that it would require more complicated scheduling logic and data structures to handle the two cases. The advantage of it is that it would be able to represent node-only requests with multiple nodes, which may be essential for some applications. It could also be overloaded to handle gang-scheduling.

          I'm having trouble coming up with anything that's substantively than these two approaches. Are there other alternatives I'm missing?

          Show
          Sandy Ryza added a comment - I've thought about this further a little. The alternative that occurs to me would be to have the option to associate an group ID with a resource request. Under the current model, when a container is assigned, requests are decremented "up", i.e. if it's a node-local container, the requests for the corresponding rack-local container and * are decremented. This would remain the same, unless the assigned container has a group ID, in which case all other requests with that group ID would be decremented instead. In some ways, this resembles the task-centric approach proposed in YARN-371 , but it avoids most of the performance implications by allowing resource-centric scheduling for applications like mapreduce that don't have these special needs. The disadvantage of this approach would be that it would require more complicated scheduling logic and data structures to handle the two cases. The advantage of it is that it would be able to represent node-only requests with multiple nodes, which may be essential for some applications. It could also be overloaded to handle gang-scheduling. I'm having trouble coming up with anything that's substantively than these two approaches. Are there other alternatives I'm missing?
          Hide
          Sandy Ryza added a comment -

          Uploaded a first pass at what the second approach might look like.

          Show
          Sandy Ryza added a comment - Uploaded a first pass at what the second approach might look like.
          Hide
          Alejandro Abdelnur added a comment -

          I think the 2nd approach (introducing an ID in the ResourceRequest) is preferable to the boolean flag as it will allow to express a set of OR localities.

          I've been thinking a bit on how adding/handling this new piece of information would affect the RM/Scheduler.

          After running some use cases on the whiteboard, I think handling this ID is entirely optional to the scheduler to implement, the scheduler could completely ignore this ID and things would still work correctly. In such scenario, an AM would just receive additional container allocations which in turn it would return them back to the RM without using them.

          A scheduler implementing this ID tracking functionality it would require to have an additional crossrefence Map keying off pending requests with AppId+ID to be able to access and decrement the corresponding outstanding resource request. And this does not seem overly complicated.

          One small change I would suggest is the resource allocation response to the AM to include the ID that is being satisfied (in case the scheduler handles IDs). This will help the AM to do a more optimal decision and bookkeeping on what to use the received allocation for.

          Show
          Alejandro Abdelnur added a comment - I think the 2nd approach (introducing an ID in the ResourceRequest) is preferable to the boolean flag as it will allow to express a set of OR localities. I've been thinking a bit on how adding/handling this new piece of information would affect the RM/Scheduler. After running some use cases on the whiteboard, I think handling this ID is entirely optional to the scheduler to implement, the scheduler could completely ignore this ID and things would still work correctly. In such scenario, an AM would just receive additional container allocations which in turn it would return them back to the RM without using them. A scheduler implementing this ID tracking functionality it would require to have an additional crossrefence Map keying off pending requests with AppId+ID to be able to access and decrement the corresponding outstanding resource request. And this does not seem overly complicated. One small change I would suggest is the resource allocation response to the AM to include the ID that is being satisfied (in case the scheduler handles IDs). This will help the AM to do a more optimal decision and bookkeeping on what to use the received allocation for.
          Hide
          Bikas Saha added a comment -

          From what I understand this seems to be tangentially going down the path of the discussion that happened in YARN-371. The crucial point is that the YARN resource scheduler is not a task scheduler. So introducing concepts that directly or indirectly make it do task scheduling would be inconsistent with the design. Its a coarse grained resource allocator that gives the app containers that represent chunks of resources using which the app can schedule its tasks. Different versions of the scheduler change the way the resource sharing is being done. Fair/Capacity or otherwise. Ideally we should have only 1 scheduler that has hooks to change the sharing policy. The code kind off reflects that because there is so much common code/logic between both implementations.

          Unfortunately, in both the Fair and Capacity Scheduler the implementations have mixed up 1) decision to allocate at and below a given topology level [say * level] with 2) whether there are resource requests at that level. E.g. when allocation cycle is started for an app, the logic starts at the * and checks if the resource requests count > 0. If yes then it goes into racks and then nodes. Which means that if an application wants resources only at a node then it has to create requests at the rack and * level too. This is because locality relaxation has gotten mixed up with being "schedulable", if you catch my drift. My strong belief is that if we can fix this overload then we wont need to fix this jira. However I can see that fixing the overload will be a very complicated knot to untie and perhaps impossible to do now because it may be inextricably linked with the API. Which is why I created this jira.

          Now, if the problem is the * overloaded that I describe above, then the problem is the entanglement of delay scheduling (for locality). Here is an alternative proposal that addresses this problem. Lets make the delay of the delay scheduling specifiable by the application. So an application can specify how long to wait before relaxing its node requests to rack and *. When an app wants containers on specific nodes it basically means that it does not want the RM to automatically relax its locality - thus specifying a large value for the delay. The end result being allocation on specific nodes if resources become available on those nodes. This also serves as a useful extension of delay scheduling. Short apps can be aggressive in relaxing locality while long+large jobs can be more conservative in trading of scheduling speed with network IO.
          The catch in the proposal is that such requests have to be made at a different priority level. Resource requests at the same priority level get aggregated and we dont want to aggregate relaxable resource requests with non-relaxable resource requests. I think this is a good thing to do anyways because it makes the application think and decide which kind of tasks it needs to get running first.

          An extension of this approach also ties in nicely with the API enhancement suggested by YARN-394. The RM could actually inform the app that it has not been able to allocate a resource request on a node and the time limit has elapsed. At which point, the app could cancel that request and ask for an alternative set of nodes. I agree I am hand-waving in this paragraph.

          Thoughts?

          Show
          Bikas Saha added a comment - From what I understand this seems to be tangentially going down the path of the discussion that happened in YARN-371 . The crucial point is that the YARN resource scheduler is not a task scheduler. So introducing concepts that directly or indirectly make it do task scheduling would be inconsistent with the design. Its a coarse grained resource allocator that gives the app containers that represent chunks of resources using which the app can schedule its tasks. Different versions of the scheduler change the way the resource sharing is being done. Fair/Capacity or otherwise. Ideally we should have only 1 scheduler that has hooks to change the sharing policy. The code kind off reflects that because there is so much common code/logic between both implementations. Unfortunately, in both the Fair and Capacity Scheduler the implementations have mixed up 1) decision to allocate at and below a given topology level [say * level] with 2) whether there are resource requests at that level. E.g. when allocation cycle is started for an app, the logic starts at the * and checks if the resource requests count > 0. If yes then it goes into racks and then nodes. Which means that if an application wants resources only at a node then it has to create requests at the rack and * level too. This is because locality relaxation has gotten mixed up with being "schedulable", if you catch my drift. My strong belief is that if we can fix this overload then we wont need to fix this jira. However I can see that fixing the overload will be a very complicated knot to untie and perhaps impossible to do now because it may be inextricably linked with the API. Which is why I created this jira. Now, if the problem is the * overloaded that I describe above, then the problem is the entanglement of delay scheduling (for locality). Here is an alternative proposal that addresses this problem. Lets make the delay of the delay scheduling specifiable by the application. So an application can specify how long to wait before relaxing its node requests to rack and *. When an app wants containers on specific nodes it basically means that it does not want the RM to automatically relax its locality - thus specifying a large value for the delay. The end result being allocation on specific nodes if resources become available on those nodes. This also serves as a useful extension of delay scheduling. Short apps can be aggressive in relaxing locality while long+large jobs can be more conservative in trading of scheduling speed with network IO. The catch in the proposal is that such requests have to be made at a different priority level. Resource requests at the same priority level get aggregated and we dont want to aggregate relaxable resource requests with non-relaxable resource requests. I think this is a good thing to do anyways because it makes the application think and decide which kind of tasks it needs to get running first. An extension of this approach also ties in nicely with the API enhancement suggested by YARN-394 . The RM could actually inform the app that it has not been able to allocate a resource request on a node and the time limit has elapsed. At which point, the app could cancel that request and ask for an alternative set of nodes. I agree I am hand-waving in this paragraph. Thoughts?
          Hide
          Sandy Ryza added a comment -

          The proposal of per-app delay-scheduling parameters is one I hadn't thought of, and I think a good one for many use cases. Do you mean that the delay threshold would be configurable per-app or per-priority?

          The cases that I don't think it supports are:

          • If the delay threshold is only configurable per app, an app needs some containers strictly on specific nodes, and for other containers only has loose preferences.
          • An application wants two containers, the first on only node1 or node2 and the second on only node3 or node4. What tells the scheduler not to assign both of the containers on node1 and node2? These containers could be requested at different priorities, but that would essentially be using priorities to do task-centric scheduling.

          Are these use cases non-goals for YARN? Correct me if I'm wrong, but my understanding was that the primary reason that the resource scheduler is not a task scheduler is for performance reasons. If we can allow it to be task-centric when necessary, but avoid the performance impact of making it task-centric all the time, it will support location-specific scheduling in the most flexible and intuitive way.

          I hope this isn't rehashing the debate from YARN-371. For anybody who will be the YARN meetup tomorrow, it would be great to chat about this for a couple minutes.

          Show
          Sandy Ryza added a comment - The proposal of per-app delay-scheduling parameters is one I hadn't thought of, and I think a good one for many use cases. Do you mean that the delay threshold would be configurable per-app or per-priority? The cases that I don't think it supports are: If the delay threshold is only configurable per app, an app needs some containers strictly on specific nodes, and for other containers only has loose preferences. An application wants two containers, the first on only node1 or node2 and the second on only node3 or node4. What tells the scheduler not to assign both of the containers on node1 and node2? These containers could be requested at different priorities, but that would essentially be using priorities to do task-centric scheduling. Are these use cases non-goals for YARN? Correct me if I'm wrong, but my understanding was that the primary reason that the resource scheduler is not a task scheduler is for performance reasons. If we can allow it to be task-centric when necessary, but avoid the performance impact of making it task-centric all the time, it will support location-specific scheduling in the most flexible and intuitive way. I hope this isn't rehashing the debate from YARN-371 . For anybody who will be the YARN meetup tomorrow, it would be great to chat about this for a couple minutes.
          Hide
          Alejandro Abdelnur added a comment -

          I'd like to restate he problem.

          Making things a bit more high level, the end goal is for an AM to give certain hints to the RM scheduler on how it plans to use requested resources.

          Hints are just that, 'hints'. They may not be taken into consideration, AMs must not rely on hints to be able to work properly. It is fine if the RM scheduler ignores hints completely (because it is too busy or because it does not understand them). An RM scheduler that understands a hint may use it to make more optimal allocation decisions and may give AMs a speed boost.

          Another thing to keep in mind is that hints won't complicate the RM logic as the RM only involvement is passing them to the scheduler.

          Examples of hints are: gang scheduling, desired locality, desired multi-locality, resources fulfillment timeout, future resource allocation.

          I can understand the worries about going task centric, but I think the hints approach is a bit different. Being able to specify hints will enable scheduling features experimentation without requiring protocol changes. Eventually, if we find that a hint is a good feature to support at scheduler API level we may eventually add it to the protocol/API.

          The changes in the protocol/API would be as simple as having and extra String field in resource requests and resources allocations to indicate hints (on requests) and receive the hints taken into consideration (on allocations).

          Thoughts?

          Show
          Alejandro Abdelnur added a comment - I'd like to restate he problem. Making things a bit more high level, the end goal is for an AM to give certain hints to the RM scheduler on how it plans to use requested resources. Hints are just that, 'hints'. They may not be taken into consideration, AMs must not rely on hints to be able to work properly. It is fine if the RM scheduler ignores hints completely (because it is too busy or because it does not understand them). An RM scheduler that understands a hint may use it to make more optimal allocation decisions and may give AMs a speed boost. Another thing to keep in mind is that hints won't complicate the RM logic as the RM only involvement is passing them to the scheduler. Examples of hints are: gang scheduling, desired locality, desired multi-locality, resources fulfillment timeout, future resource allocation. I can understand the worries about going task centric, but I think the hints approach is a bit different. Being able to specify hints will enable scheduling features experimentation without requiring protocol changes. Eventually, if we find that a hint is a good feature to support at scheduler API level we may eventually add it to the protocol/API. The changes in the protocol/API would be as simple as having and extra String field in resource requests and resources allocations to indicate hints (on requests) and receive the hints taken into consideration (on allocations). Thoughts?
          Hide
          Bikas Saha added a comment -

          That would be a different JIRA, dont you think?

          Show
          Bikas Saha added a comment - That would be a different JIRA, dont you think?
          Hide
          Bikas Saha added a comment -

          @Sandy
          It would be per priority like I mentioned in my explanation. Or else it wont be useful for a mixed scenario like you mention in 1)
          Your example 2 highlights the inherent lossy nature of the protocol and is orthogonal to this jira. e.g. take the mapreduce case. and say map 1 is local to node1 and node2 while map2 is local to node3 and node4. They also have rack and * preferences. nothing prevents the RM from allocating node1 and node2 to the app if node2 and node3 happen to be on the same rack. The app has to cancel node2 from its resource map as soon as it satisfies map1 with node1. That is why we need (and have) a method to release unwanted containers. In the general case, distributed allocation (where request and grant are across nodes) is subject to race conditions.

          Show
          Bikas Saha added a comment - @Sandy It would be per priority like I mentioned in my explanation. Or else it wont be useful for a mixed scenario like you mention in 1) Your example 2 highlights the inherent lossy nature of the protocol and is orthogonal to this jira. e.g. take the mapreduce case. and say map 1 is local to node1 and node2 while map2 is local to node3 and node4. They also have rack and * preferences. nothing prevents the RM from allocating node1 and node2 to the app if node2 and node3 happen to be on the same rack. The app has to cancel node2 from its resource map as soon as it satisfies map1 with node1. That is why we need (and have) a method to release unwanted containers. In the general case, distributed allocation (where request and grant are across nodes) is subject to race conditions.
          Hide
          Sandy Ryza added a comment -

          Filed YARN-424 for Alejandro's hints proposal.

          Show
          Sandy Ryza added a comment - Filed YARN-424 for Alejandro's hints proposal.
          Hide
          Arun C Murthy added a comment -

          Gang-scheduling, deadline-scheduling etc. are not hints.

          They are core features in the scheduler and should be treated as such i.e. we should explicitly support them in the scheduler.

          I've commented on YARN-424 too.

          Show
          Arun C Murthy added a comment - Gang-scheduling, deadline-scheduling etc. are not hints . They are core features in the scheduler and should be treated as such i.e. we should explicitly support them in the scheduler. I've commented on YARN-424 too.
          Hide
          Arun C Murthy added a comment -

          Also, we don't have to support all features in the YARN scheduler on day one - let's be judicious and careful.

          We need to ship first stable release of YARN (which we are close), learn from our experiences and then add things.

          Show
          Arun C Murthy added a comment - Also, we don't have to support all features in the YARN scheduler on day one - let's be judicious and careful. We need to ship first stable release of YARN (which we are close), learn from our experiences and then add things.
          Hide
          Arun C Murthy added a comment -

          Take a look at YARN-398 for a simpler alternate proposal without needing to resort to hints etc.

          Show
          Arun C Murthy added a comment - Take a look at YARN-398 for a simpler alternate proposal without needing to resort to hints etc.
          Hide
          Sandy Ryza added a comment -

          Uploading a patch based on the earlier discussion here and on YARN-398. The patch adds a boolean flag to each resource request which essentially means "don't schedule using this resource request or any above it" and adds support for it to the fair scheduler. I call the flag "noAllocateAt", but we could definitely use a better name if anybody has suggestions. I didn't use "blacklist" because it already has a meaning in the context of mapreduce, and to me seems to imply that a blacklisted rack would not allow any containers to be scheduled anywhere on it, when the meaning is a little different.

          Show
          Sandy Ryza added a comment - Uploading a patch based on the earlier discussion here and on YARN-398 . The patch adds a boolean flag to each resource request which essentially means "don't schedule using this resource request or any above it" and adds support for it to the fair scheduler. I call the flag "noAllocateAt", but we could definitely use a better name if anybody has suggestions. I didn't use "blacklist" because it already has a meaning in the context of mapreduce, and to me seems to imply that a blacklisted rack would not allow any containers to be scheduled anywhere on it, when the meaning is a little different.
          Hide
          Bikas Saha added a comment -

          How about calling is disableLocalityRelaxation as thats what it is basically doing. When specified on a node it would mean do not relax locality to rack or ANY. Potentially, we could also say that when specified on a rack then it would mean do not relax locality to ANY. Do you think it could be used to specify either exact nodes or exact racks. In that case, we would need to check that if this flag is set then either nodes or racks but not both are specified.
          I am not sure how to make sense of 2 different asks to the RM (at the same priority) that say
          1) allocate at specific node A (ie do not relax locality to rackA)
          2) allocate at specific rack rackA (ie do not relax locality to ANY) where node A is contained in rackA

          Show
          Bikas Saha added a comment - How about calling is disableLocalityRelaxation as thats what it is basically doing. When specified on a node it would mean do not relax locality to rack or ANY. Potentially, we could also say that when specified on a rack then it would mean do not relax locality to ANY. Do you think it could be used to specify either exact nodes or exact racks. In that case, we would need to check that if this flag is set then either nodes or racks but not both are specified. I am not sure how to make sense of 2 different asks to the RM (at the same priority) that say 1) allocate at specific node A (ie do not relax locality to rackA) 2) allocate at specific rack rackA (ie do not relax locality to ANY) where node A is contained in rackA
          Hide
          Sandy Ryza added a comment -

          It seems like we have two slightly different proposals, both of which add a boolean flag to ResourceRequests. In one, suggested by me earlier on this JIRA and by Bikas in his previous comment, if I want a container only at node1, I submit my request for node1 with the flag turned on. For this proposal, I think the way that makes most sense would be to require a node-level request with the flag turned on to be accompanied by rack-level and *-level requests with the flag turned on. An advantage of this approach is that it feels a little more intuitive. A disadvantage is that it requires modifying the scheduler data structures to separately account for node-specific requests.

          In another, suggested by Arun in YARN-398 and implemented by me in the March 28 patch, if I want a container only at node1, I set the flag on the rack that node1 is on. An advantage of this approach is that it allows blacklisting, i.e. saying I'm ok with a container anywhere but node2. A disadvantage is that it does not allow some requests at the same priority to be node-specific and others not.

          I'm not convinced yet on which approach to take. Am I representing all the options? Is there a usecase for blacklisting? Is there a use case for having some requests at a priority be node-specific and others not?

          Show
          Sandy Ryza added a comment - It seems like we have two slightly different proposals, both of which add a boolean flag to ResourceRequests. In one, suggested by me earlier on this JIRA and by Bikas in his previous comment, if I want a container only at node1, I submit my request for node1 with the flag turned on. For this proposal, I think the way that makes most sense would be to require a node-level request with the flag turned on to be accompanied by rack-level and *-level requests with the flag turned on. An advantage of this approach is that it feels a little more intuitive. A disadvantage is that it requires modifying the scheduler data structures to separately account for node-specific requests. In another, suggested by Arun in YARN-398 and implemented by me in the March 28 patch, if I want a container only at node1, I set the flag on the rack that node1 is on. An advantage of this approach is that it allows blacklisting, i.e. saying I'm ok with a container anywhere but node2. A disadvantage is that it does not allow some requests at the same priority to be node-specific and others not. I'm not convinced yet on which approach to take. Am I representing all the options? Is there a usecase for blacklisting? Is there a use case for having some requests at a priority be node-specific and others not?
          Hide
          Bikas Saha added a comment -

          Sorry, I did not see that patch carefully and assumed that it does what is suggested in https://issues.apache.org/jira/browse/YARN-392?focusedCommentId=13583713&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13583713 whereas it actually implements the proposal in YARN-398.
          The typical use case for blacklisting is to disable a set of nodes globally. e.g. never gives me nodes A and B even when I ask for resources at *. Having to implement blacklisting by doing it on a per-priority will make the common case painful to work with. So I am not in favor of such a proposal unless there is a strong use case for blacklisting on specific priorities. Arun, Vinod and I had an offline discussion where we agreed that we are better off creating an API for blacklisting a set of nodes.

          Show
          Bikas Saha added a comment - Sorry, I did not see that patch carefully and assumed that it does what is suggested in https://issues.apache.org/jira/browse/YARN-392?focusedCommentId=13583713&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13583713 whereas it actually implements the proposal in YARN-398 . The typical use case for blacklisting is to disable a set of nodes globally. e.g. never gives me nodes A and B even when I ask for resources at *. Having to implement blacklisting by doing it on a per-priority will make the common case painful to work with. So I am not in favor of such a proposal unless there is a strong use case for blacklisting on specific priorities. Arun, Vinod and I had an offline discussion where we agreed that we are better off creating an API for blacklisting a set of nodes.
          Hide
          Thomas Graves added a comment -

          Bikas when you say creating an API for blacklisting a set of nodes are you basically referring to YARN-398 or something else?

          Show
          Thomas Graves added a comment - Bikas when you say creating an API for blacklisting a set of nodes are you basically referring to YARN-398 or something else?
          Hide
          Bikas Saha added a comment -

          Yes YARN-398 but not the proposal currently in there. The alternative proposal is to have a new method in AM RM protocol using which the AM can blacklist nodes globally for all tasks (at all priorities) for that app.

          Show
          Bikas Saha added a comment - Yes YARN-398 but not the proposal currently in there. The alternative proposal is to have a new method in AM RM protocol using which the AM can blacklist nodes globally for all tasks (at all priorities) for that app.
          Hide
          Sandy Ryza added a comment -

          Ok, I will work on a patch for the non-blacklist proposal. To clarify, should location-specific requests be able to coexist with non-location-specific requests at the same priority?

          Show
          Sandy Ryza added a comment - Ok, I will work on a patch for the non-blacklist proposal. To clarify, should location-specific requests be able to coexist with non-location-specific requests at the same priority?
          Hide
          Bikas Saha added a comment -

          I dont think its possible for location specific and non-location specific requests to live at the same priority. This is mainly because of the way current schedulers are implemented in the RM (grouped together and keyed by location and priority). Such requests have to be separated by priority and that may not be a bad thing IMO.
          I discussed this offline with Vinod Kumar Vavilapalli and I would like to suggest and extension to the approach. Instead of a flag, how about specifying a time interval that tells the RM how long to wait before dropping locality. A time interval of infinite would be the same as a boolean flag and so this approach covers the other one. Additionally, it lets a large map to be more conservative about dropping locality over latency and a short job more aggressive. Currently the value of this interval comes from config and maps to the number of scheduling attempts missed by this request. This is done by keeping a count of node heartbeats. Given the number of nodes and heartbeat interval, the user specified time interval can easily be mapped to a count that matches the current implementation. So this will not be a perf hit nor a change in logic compared to existing code.
          Another thing to consider is allowing users to say I want to be scheduled only on these racks. Again, I dont think we can mix node-specfic and rack-specific scheduling at the same priority.

          Show
          Bikas Saha added a comment - I dont think its possible for location specific and non-location specific requests to live at the same priority. This is mainly because of the way current schedulers are implemented in the RM (grouped together and keyed by location and priority). Such requests have to be separated by priority and that may not be a bad thing IMO. I discussed this offline with Vinod Kumar Vavilapalli and I would like to suggest and extension to the approach. Instead of a flag, how about specifying a time interval that tells the RM how long to wait before dropping locality. A time interval of infinite would be the same as a boolean flag and so this approach covers the other one. Additionally, it lets a large map to be more conservative about dropping locality over latency and a short job more aggressive. Currently the value of this interval comes from config and maps to the number of scheduling attempts missed by this request. This is done by keeping a count of node heartbeats. Given the number of nodes and heartbeat interval, the user specified time interval can easily be mapped to a count that matches the current implementation. So this will not be a perf hit nor a change in logic compared to existing code. Another thing to consider is allowing users to say I want to be scheduled only on these racks. Again, I dont think we can mix node-specfic and rack-specific scheduling at the same priority.
          Hide
          Arun C Murthy added a comment -

          Bikas Saha I'm against using timers for specifying locality delays - it doesn't make sense for a variety of reasons documented elsewhere.


          Sandy Ryza I just uploaded a patch I lost track of for a week or so on YARN-398. Looks like we both are doing the same thing. I'm happy to repurpose one of the two jiras for CS while the other can do the same for FS. Makes sense?

          In my patch I called the flag as 'strictLocality' which defaults to 'false'. That should solve the need for white-lists. Makes sense?


          I agree we should tackle black-listing separately.

          Show
          Arun C Murthy added a comment - Bikas Saha I'm against using timers for specifying locality delays - it doesn't make sense for a variety of reasons documented elsewhere. Sandy Ryza I just uploaded a patch I lost track of for a week or so on YARN-398 . Looks like we both are doing the same thing. I'm happy to repurpose one of the two jiras for CS while the other can do the same for FS. Makes sense? In my patch I called the flag as 'strictLocality' which defaults to 'false'. That should solve the need for white-lists. Makes sense? I agree we should tackle black-listing separately.
          Hide
          Arun C Murthy added a comment -

          To be clear, the approach I took on YARN-398 allows for the 'I want only one container, and only on node1 or node2' use-case.

          Show
          Arun C Murthy added a comment - To be clear, the approach I took on YARN-398 allows for the 'I want only one container, and only on node1 or node2' use-case.
          Hide
          Arun C Murthy added a comment -

          Also, it allows for I want 'one container on any one of the following n racks' too.

          Show
          Arun C Murthy added a comment - Also, it allows for I want 'one container on any one of the following n racks' too.
          Hide
          Bikas Saha added a comment -

          I'm against using timers for specifying locality delays - it doesn't make sense for a variety of reasons documented elsewhere.

          Can you please point me to them?

          Show
          Bikas Saha added a comment - I'm against using timers for specifying locality delays - it doesn't make sense for a variety of reasons documented elsewhere. Can you please point me to them?
          Hide
          Sandy Ryza added a comment -

          Arun C Murthy, that makes sense to me. We can use this one for FS and YARN-398 for CS? Do you think this should go into FIFO as well?
          Bikas Saha, if we went with your proposal, would it not make sense to go with the convention used in the FS/CS already, in which the locality delay is a fraction of the cluster size? In your proposal, if I want a node-local container at node1, would I specify the locality delay on the request for node1 or on the request for the rack that node1 is on?

          Show
          Sandy Ryza added a comment - Arun C Murthy , that makes sense to me. We can use this one for FS and YARN-398 for CS? Do you think this should go into FIFO as well? Bikas Saha , if we went with your proposal, would it not make sense to go with the convention used in the FS/CS already, in which the locality delay is a fraction of the cluster size? In your proposal, if I want a node-local container at node1, would I specify the locality delay on the request for node1 or on the request for the rack that node1 is on?
          Hide
          Sandy Ryza added a comment -

          Any further thoughts on this?

          Show
          Sandy Ryza added a comment - Any further thoughts on this?
          Hide
          Sandy Ryza added a comment -

          Uploaded a patch that adds a test for strict locality in the fair scheduler.

          Show
          Sandy Ryza added a comment - Uploaded a patch that adds a test for strict locality in the fair scheduler.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12579409/YARN-392-2.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/773//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12579409/YARN-392-2.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/773//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12579412/YARN-392-2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/774//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/774//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12579412/YARN-392-2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/774//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/774//console This message is automatically generated.
          Hide
          Bikas Saha added a comment -

          would it not make sense to go with the convention used in the FS/CS already, in which the locality delay is a fraction of the cluster size? In your proposal, if I want a node-local container at node1, would I specify the locality delay on the request for node1 or on the request for the rack that node1 is on?

          Yes. I mentioned in the proposal that the user time interval would be converted to count RM is using currently by using the total cluster size and NM heartbeat interval. So if cluster size = C and NM heartbeat every S seconds and user interval is T seconds then wait for (S/T)*C number of NM heartbeats. So its not any more expensive wrt CPU for the scheduler than what we currently have.

          Specific locality for node would set the delay on the rack and specific locality for ANY would set the delay on ANY. Isnt that how we set the noAllocateAt flag for the boolean approach?

          Show
          Bikas Saha added a comment - would it not make sense to go with the convention used in the FS/CS already, in which the locality delay is a fraction of the cluster size? In your proposal, if I want a node-local container at node1, would I specify the locality delay on the request for node1 or on the request for the rack that node1 is on? Yes. I mentioned in the proposal that the user time interval would be converted to count RM is using currently by using the total cluster size and NM heartbeat interval. So if cluster size = C and NM heartbeat every S seconds and user interval is T seconds then wait for (S/T)*C number of NM heartbeats. So its not any more expensive wrt CPU for the scheduler than what we currently have. Specific locality for node would set the delay on the rack and specific locality for ANY would set the delay on ANY. Isnt that how we set the noAllocateAt flag for the boolean approach?
          Hide
          Alejandro Abdelnur added a comment -

          The idea of a locality delay gives more flexibility, I like it.

          Though I think it may complicate things quite a bit on the scheduler to be able to do the right book-keeping. Today, because the delay is at app level there is not delay counting at allocation requests level.

          If we move the delay at allocation request level, it means we'd have to keep a counter at 'rack' level that gets decremented on every allocation attempt and when hits zero we go rack if node was not fulfilled.

          This means that the allocation request in the scheduler will have to have a new delay-counter property.

          The complexity comes that when an AM places a new allocation request, the AM must provide the non-empty allocation requests in full.

          For example:

          Requesting 5 containers for node1/rack1:

            location=*     - containers=5
            location=rack1 - containers=5
            location=node1 - containers=5
          

          Requesting 5 additional containers for node2/rack1 (with original allocation still pending):

            location=*     - containers=10
            location=rack1 - containers=10
            location=node2 - containers=5
          

          The current contract allows the scheduler just to put the */rack container requests without having to do a lookup and aggregation.

          If we are now keeping a delay counter associated at */rack level and we do a put, we'll reset the delay-counter for the node1 request family. If we keep the delay-counter of node1 request family and use it for the node2 request family we'll be shorting the node2 request expected locality delay.

          The delay-locality per container request has value but I think it may require much more work.

          Given that, don't you think getting the current approach suggested/implemented by Arun/Sandy makes sense in the short term?

          Show
          Alejandro Abdelnur added a comment - The idea of a locality delay gives more flexibility, I like it. Though I think it may complicate things quite a bit on the scheduler to be able to do the right book-keeping. Today, because the delay is at app level there is not delay counting at allocation requests level. If we move the delay at allocation request level, it means we'd have to keep a counter at 'rack' level that gets decremented on every allocation attempt and when hits zero we go rack if node was not fulfilled. This means that the allocation request in the scheduler will have to have a new delay-counter property. The complexity comes that when an AM places a new allocation request, the AM must provide the non-empty allocation requests in full. For example: Requesting 5 containers for node1/rack1: location=* - containers=5 location=rack1 - containers=5 location=node1 - containers=5 Requesting 5 additional containers for node2/rack1 (with original allocation still pending): location=* - containers=10 location=rack1 - containers=10 location=node2 - containers=5 The current contract allows the scheduler just to put the */rack container requests without having to do a lookup and aggregation. If we are now keeping a delay counter associated at */rack level and we do a put, we'll reset the delay-counter for the node1 request family. If we keep the delay-counter of node1 request family and use it for the node2 request family we'll be shorting the node2 request expected locality delay. The delay-locality per container request has value but I think it may require much more work. Given that, don't you think getting the current approach suggested/implemented by Arun/Sandy makes sense in the short term?
          Hide
          Alejandro Abdelnur added a comment -

          Bikas Saha, are you OK going with a first approach without indicating the timeout in the request?

          Sandy Ryza, the patch looks good, would you please update the patch fixing the failing testcase?

          Show
          Alejandro Abdelnur added a comment - Bikas Saha , are you OK going with a first approach without indicating the timeout in the request? Sandy Ryza , the patch looks good, would you please update the patch fixing the failing testcase?
          Hide
          Sandy Ryza added a comment -

          Latest patch should fix the failing testcase

          Show
          Sandy Ryza added a comment - Latest patch should fix the failing testcase
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12582503/YARN-392-2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/902//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/902//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12582503/YARN-392-2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/902//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/902//console This message is automatically generated.
          Hide
          Bikas Saha added a comment -

          I am fine with a boolean because the client can wait for the same timeout and then unset the flag using a new resource request, if it wants to. We need to have a test that verifies this behavior.

          About the last patch
          How about disableAllocation as an alternative name?

          Why is there a disable node local request? Node-Specific==(Disable-rack+). Rack-specific==(Disable-) Where does Disable-node make sense?

          +        if (localRequest != null && localRequest.getNoAllocateAt()) {
          +          continue;
          +        }
          

          The two checks seem to be different. One checks for containers > 0 while the other does not. Dont know if that matters in the fair scheduler?

          +        if (rackLocalRequest != null && rackLocalRequest.getNoAllocateAt()) {
          +          continue;
          +        }
           
                   if (rackLocalRequest != null && rackLocalRequest.getNumContainers() != 0
          

          I dont see rack info being set anywhere. Shouldnt the nodes end up getting rack==default-rack. If thats true then sending a rack request for rack=rack1 is probably not testing what was intended, right?

          +    ResourceRequest rackRequest = createResourceRequest(1024, "rack1", 1, 1, true);
          

          As discussed earlier in this jira, its not possible to mix strict and non-strict allocations at the same priority. I dont see that being checked/enforced anywhere. Similarly, it does not look like we can mix strict-node and strict-rack at the same priority.

          Are capacity scheduler changes not targeted for this patch?

          Show
          Bikas Saha added a comment - I am fine with a boolean because the client can wait for the same timeout and then unset the flag using a new resource request, if it wants to. We need to have a test that verifies this behavior. About the last patch How about disableAllocation as an alternative name? Why is there a disable node local request? Node-Specific==(Disable-rack+ ). Rack-specific==(Disable- ) Where does Disable-node make sense? + if (localRequest != null && localRequest.getNoAllocateAt()) { + continue ; + } The two checks seem to be different. One checks for containers > 0 while the other does not. Dont know if that matters in the fair scheduler? + if (rackLocalRequest != null && rackLocalRequest.getNoAllocateAt()) { + continue ; + } if (rackLocalRequest != null && rackLocalRequest.getNumContainers() != 0 I dont see rack info being set anywhere. Shouldnt the nodes end up getting rack==default-rack. If thats true then sending a rack request for rack=rack1 is probably not testing what was intended, right? + ResourceRequest rackRequest = createResourceRequest(1024, "rack1" , 1, 1, true ); As discussed earlier in this jira, its not possible to mix strict and non-strict allocations at the same priority. I dont see that being checked/enforced anywhere. Similarly, it does not look like we can mix strict-node and strict-rack at the same priority. Are capacity scheduler changes not targeted for this patch?
          Hide
          Arun C Murthy added a comment -

          Can we please call it 'fallThrough' or some such? 'getNoAllocateAt' seems very confusing.

          Show
          Arun C Murthy added a comment - Can we please call it 'fallThrough' or some such? 'getNoAllocateAt' seems very confusing.
          Hide
          Bikas Saha added a comment -

          Or "relaxLocality" such that its true always and disabled for specific case.

          Show
          Bikas Saha added a comment - Or "relaxLocality" such that its true always and disabled for specific case.
          Hide
          Sandy Ryza added a comment -

          Disable-node allows you to say "I specifically don't want a container on this node". I can't speak to whether this is a useful feature, but it makes the semantics consistent in the sense that the flag can simply mean "don't directly use this ResourceRequest for an allocation". Thoughts?

          If we allow Disable-node, I like disableAllocation as the name. Otherwise, fallThrough or relaxLocality both seem good to me.

          The two checks seem to be different. One checks for containers > 0 while the other does not. Dont know if that matters in the fair scheduler?

          This is correct behavior, according to the way I was envisioning it. If we hit a disableAllocation flag, we want to abort trying to allocate this node/priority to the app entirely. On the other hand, if the node/rack has 0 requests, we want to fall through to the next level. I'll try to see if there's a clearer way to structure the code.

          As discussed earlier in this jira, its not possible to mix strict and non-strict allocations at the same priority. I dont see that being checked/enforced anywhere.

          I'll add that in. What should we do if the submitted ResourceRequests are invalid?

          Capacity scheduler changes are targeted for YARN-398.

          Show
          Sandy Ryza added a comment - Disable-node allows you to say "I specifically don't want a container on this node". I can't speak to whether this is a useful feature, but it makes the semantics consistent in the sense that the flag can simply mean "don't directly use this ResourceRequest for an allocation". Thoughts? If we allow Disable-node, I like disableAllocation as the name. Otherwise, fallThrough or relaxLocality both seem good to me. The two checks seem to be different. One checks for containers > 0 while the other does not. Dont know if that matters in the fair scheduler? This is correct behavior, according to the way I was envisioning it. If we hit a disableAllocation flag, we want to abort trying to allocate this node/priority to the app entirely. On the other hand, if the node/rack has 0 requests, we want to fall through to the next level. I'll try to see if there's a clearer way to structure the code. As discussed earlier in this jira, its not possible to mix strict and non-strict allocations at the same priority. I dont see that being checked/enforced anywhere. I'll add that in. What should we do if the submitted ResourceRequests are invalid? Capacity scheduler changes are targeted for YARN-398 .
          Hide
          Alejandro Abdelnur added a comment -

          'fallThrough' please (even if i cannot pronounce it).

          | As discussed earlier in this jira, its not possible to mix strict and non-strict allocations at the same priority. I dont see that being checked/enforced anywhere.

          I'll add that in. What should we do if the submitted ResourceRequests are invalid?

          I don't think we need to worry about this, an request for a location will override the previous one, thus an invalid mix cannot happen.

          Show
          Alejandro Abdelnur added a comment - 'fallThrough' please (even if i cannot pronounce it). | As discussed earlier in this jira, its not possible to mix strict and non-strict allocations at the same priority. I dont see that being checked/enforced anywhere. I'll add that in. What should we do if the submitted ResourceRequests are invalid? I don't think we need to worry about this, an request for a location will override the previous one, thus an invalid mix cannot happen.
          Hide
          Bikas Saha added a comment -

          Actually I dont quite get fallThrough . Fall through what?

          I don't think we need to worry about this, an request for a location will override the previous one, thus an invalid mix cannot happen.

          How about this case. Time T1 I make request at priority P1 for specific node N1 in rack R1. So now R1 & * have relaxLocality set to false to prevent allocation at rack/* for that priority. Now at time T2, if I make a request at priority P1 for specific rack R1. That would require relaxLocality flag to be false only on *. That is incompatible with the existing flags set. Next case, at T1 I make P1 request for specific node N1 in rack R1. At T2 I make P1 request for non-specific node N2 in the same rack R1. Now the relaxLocality flag on rack R1 is incompatible for both requests.

          I'll add that in. What should we do if the submitted ResourceRequests are invalid?

          Looks like this will be blocked by YARN-394. The only way out of this would be if we can validate such requests at the time of the allocate RPC itself and throw an invalidrequest exception. If we cannot do that, and can only find this out in the scheduling loop then we are blocked on YARN-394.

          Show
          Bikas Saha added a comment - Actually I dont quite get fallThrough . Fall through what? I don't think we need to worry about this, an request for a location will override the previous one, thus an invalid mix cannot happen. How about this case. Time T1 I make request at priority P1 for specific node N1 in rack R1. So now R1 & * have relaxLocality set to false to prevent allocation at rack/* for that priority. Now at time T2, if I make a request at priority P1 for specific rack R1. That would require relaxLocality flag to be false only on *. That is incompatible with the existing flags set. Next case, at T1 I make P1 request for specific node N1 in rack R1. At T2 I make P1 request for non-specific node N2 in the same rack R1. Now the relaxLocality flag on rack R1 is incompatible for both requests. I'll add that in. What should we do if the submitted ResourceRequests are invalid? Looks like this will be blocked by YARN-394 . The only way out of this would be if we can validate such requests at the time of the allocate RPC itself and throw an invalidrequest exception. If we cannot do that, and can only find this out in the scheduling loop then we are blocked on YARN-394 .
          Hide
          Sandy Ryza added a comment -

          Looks like this will be blocked by YARN-394.

          The APIs already allow for invalid requests. Nothing stops me from submitting multiple ResourceRequests at the same priority with different capabilities, which makes no sense and can cause problems inside the scheduler. The AMRMClient APIs should be added to ensure that this does not happen.

          Show
          Sandy Ryza added a comment - Looks like this will be blocked by YARN-394 . The APIs already allow for invalid requests. Nothing stops me from submitting multiple ResourceRequests at the same priority with different capabilities, which makes no sense and can cause problems inside the scheduler. The AMRMClient APIs should be added to ensure that this does not happen.
          Hide
          Sandy Ryza added a comment -

          The AMRMClient APIs should be added to ensure that this does not happen.

          Sorry, my sentence made no sense. I mean: the AMRMClient APIs that will be modified to support node-specific requests should be made to work in a way that ensures these invalid requests do not happen.

          Show
          Sandy Ryza added a comment - The AMRMClient APIs should be added to ensure that this does not happen. Sorry, my sentence made no sense. I mean: the AMRMClient APIs that will be modified to support node-specific requests should be made to work in a way that ensures these invalid requests do not happen.
          Hide
          Alejandro Abdelnur added a comment -

          Agree with Sandy, the low level API contract requires deep understanding of how things work, the AMRMClient layer hides/handles much of that complexity already, this particular feature should be handle there in the similar way. Sandy Ryza please open a JIRA for it.

          Show
          Alejandro Abdelnur added a comment - Agree with Sandy, the low level API contract requires deep understanding of how things work, the AMRMClient layer hides/handles much of that complexity already, this particular feature should be handle there in the similar way. Sandy Ryza please open a JIRA for it.
          Hide
          Bikas Saha added a comment -

          Of course, changing the AMRMClient to support this would be a logical extension.
          Does that mean that the server can afford to not check for inconsistent requests that will result in a bad state for the server and/or incorrect results for the users? Perhaps only when AMRMClient is the only entity that is ever going to talk to the server. Is that the case? Not doing checks by assuming that pre-conditions will hold is a slippery path IMO.
          Currently, when ApplicationMasterService calls scheduler.allocate then the scheduler can throw an exception about invalid allocations which get returned to the client. So its fairly easy to solve this in YARN-394.

          Show
          Bikas Saha added a comment - Of course, changing the AMRMClient to support this would be a logical extension. Does that mean that the server can afford to not check for inconsistent requests that will result in a bad state for the server and/or incorrect results for the users? Perhaps only when AMRMClient is the only entity that is ever going to talk to the server. Is that the case? Not doing checks by assuming that pre-conditions will hold is a slippery path IMO. Currently, when ApplicationMasterService calls scheduler.allocate then the scheduler can throw an exception about invalid allocations which get returned to the client. So its fairly easy to solve this in YARN-394 .
          Hide
          Sandy Ryza added a comment -

          I think the server should check weird requests and state. My point was mainly that we haven't been doing these checks up to this point, so I didn't think we should be blocked on it. Especially given that, in this case, the consequences of invalid requests are for the apps submitting them. There will be no other bad state inside the RM.

          That said, there is some sanity checking that we can do at request time, such as making sure that if a rack has disableAllocation turned on, the number of containers requested by nodes under it must sum to at least its number of containers. I can add this in to the patch.

          Show
          Sandy Ryza added a comment - I think the server should check weird requests and state. My point was mainly that we haven't been doing these checks up to this point, so I didn't think we should be blocked on it. Especially given that, in this case, the consequences of invalid requests are for the apps submitting them. There will be no other bad state inside the RM. That said, there is some sanity checking that we can do at request time, such as making sure that if a rack has disableAllocation turned on, the number of containers requested by nodes under it must sum to at least its number of containers. I can add this in to the patch.
          Hide
          Bikas Saha added a comment -

          My point was mainly that we haven't been doing these checks up to this point, so I didn't think we should be blocked on it.

          Would be great if you could help enumerate cases you know of. We can add them to YARN-394 for tracking.

          Recently, we started throwing InvalidResourceRequest in the RM for requests that are invalid (more than max resource allowed etc) in YARN-193. So that takes care of one of the known cases where checks were not being performed. The other case is when a Resource request is valid but later becomes invalid mainly related to nodes being lost. E.g. when a high memory machine is lost. Or when specific resources were requested (this jira) and they become unavailable later on. These cases motivated YARN-394 and are described therein. So we are tracking towards sanity checking IMO. In YARN-142 etc we are changing protocols so that such exceptions are visible to users and they can act on them programmatically.

          Capacity scheduler changes are targeted for YARN-398.

          The title of that jira says white listing and black listing of nodes. So you may want to check with Arun C Murthy if the intent of that jira matches what you think it is.

          Show
          Bikas Saha added a comment - My point was mainly that we haven't been doing these checks up to this point, so I didn't think we should be blocked on it. Would be great if you could help enumerate cases you know of. We can add them to YARN-394 for tracking. Recently, we started throwing InvalidResourceRequest in the RM for requests that are invalid (more than max resource allowed etc) in YARN-193 . So that takes care of one of the known cases where checks were not being performed. The other case is when a Resource request is valid but later becomes invalid mainly related to nodes being lost. E.g. when a high memory machine is lost. Or when specific resources were requested (this jira) and they become unavailable later on. These cases motivated YARN-394 and are described therein. So we are tracking towards sanity checking IMO. In YARN-142 etc we are changing protocols so that such exceptions are visible to users and they can act on them programmatically. Capacity scheduler changes are targeted for YARN-398 . The title of that jira says white listing and black listing of nodes. So you may want to check with Arun C Murthy if the intent of that jira matches what you think it is.
          Hide
          Sandy Ryza added a comment -

          For YARN-394, the case you mentioned in which a high memory machine is lost is the one I was thinking of. Adding it to the JIRA description. Filed YARN-664 for the same-priority, different-capability case.

          Sorry to keep vacillating on this, but, in the fair scheduler at least, I don't think it makes sense to include the check for high #containers on requests with disableAllocation. It requires a new data structure that maps racks to nodes and checks for all the nodes on a rack with each allocation. I think this is an unnecessary performance hit. I did, however, add in a safeguard to make sure that disabling locality on an existing request won't leave around unsatisfiable reservations, so it shouldn't be possible for an app's mistakes to result in any inconsistent RM state.

          So you may want to check with Arun C Murthy if the intent of that jira matches what you think it is.

          On a comment here on April 4, Arun mentioned that he would be happy to repurpose it. If this is no longer the case, I'd be happy to file a new JIRA, but I don't feel confident in my knowledge of the capacity scheduler to make the right changes for it.

          I'm uploading a new patch that includes the name change to disableAllocation, the test Bikas asked for related to cancelling strict locality, and a few other fixups.

          Show
          Sandy Ryza added a comment - For YARN-394 , the case you mentioned in which a high memory machine is lost is the one I was thinking of. Adding it to the JIRA description. Filed YARN-664 for the same-priority, different-capability case. Sorry to keep vacillating on this, but, in the fair scheduler at least, I don't think it makes sense to include the check for high #containers on requests with disableAllocation. It requires a new data structure that maps racks to nodes and checks for all the nodes on a rack with each allocation. I think this is an unnecessary performance hit. I did, however, add in a safeguard to make sure that disabling locality on an existing request won't leave around unsatisfiable reservations, so it shouldn't be possible for an app's mistakes to result in any inconsistent RM state. So you may want to check with Arun C Murthy if the intent of that jira matches what you think it is. On a comment here on April 4, Arun mentioned that he would be happy to repurpose it. If this is no longer the case, I'd be happy to file a new JIRA, but I don't feel confident in my knowledge of the capacity scheduler to make the right changes for it. I'm uploading a new patch that includes the name change to disableAllocation, the test Bikas asked for related to cancelling strict locality, and a few other fixups.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12582732/YARN-392-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/913//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/913//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12582732/YARN-392-3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/913//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/913//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          Sandy, patch looks good to me, only NIT is that the ResourceRequest does not have javadocs.

          Show
          Alejandro Abdelnur added a comment - Sandy, patch looks good to me, only NIT is that the ResourceRequest does not have javadocs.
          Hide
          Sandy Ryza added a comment -

          Updated patch adds javadocs

          Show
          Sandy Ryza added a comment - Updated patch adds javadocs
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12583209/YARN-392-4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/928//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/928//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12583209/YARN-392-4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/928//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/928//console This message is automatically generated.
          Hide
          Bikas Saha added a comment -

          If we allow Disable-node, I like disableAllocation as the name. Otherwise, fallThrough or relaxLocality both seem good to me.

          +        if (localRequest != null && localRequest.getDisableAllocation()) {
          +          continue;
          +        }
          

          What would it mean to set the flag at the node?

          Show
          Bikas Saha added a comment - If we allow Disable-node, I like disableAllocation as the name. Otherwise, fallThrough or relaxLocality both seem good to me. + if (localRequest != null && localRequest.getDisableAllocation()) { + continue ; + } What would it mean to set the flag at the node?
          Hide
          Sandy Ryza added a comment -

          It would mean "Don't give me a container at this priority on this node".

          Show
          Sandy Ryza added a comment - It would mean "Don't give me a container at this priority on this node".
          Hide
          Bikas Saha added a comment -

          This is confusing me. The purpose of this jira is to add support for specific nodes/racks for scheduling. ie dont relax locality automatically. In that context, what does it mean to disable allocation of containers on a node which sounds like blacklisting the node?

          Show
          Bikas Saha added a comment - This is confusing me. The purpose of this jira is to add support for specific nodes/racks for scheduling. ie dont relax locality automatically. In that context, what does it mean to disable allocation of containers on a node which sounds like blacklisting the node?
          Hide
          Sandy Ryza added a comment -

          We are implementing the approach outlined by Arun in his first comment on YARN-398. Although it is not the primary goal, the approach does allow for node/rack blacklisting, and loses nothing by doing so. Even if we were to say that you can't set the disable-allocation flag on node-level requests, it would still be possible to blacklist racks by setting the disable flag on a rack and submitting node requests for nodes under it. It would also still be possible to blacklist nodes by whitelisting every other node on its rack. Allowing the disable-allocation flag on node-level requests just makes the semantics more consistent.

          I'll update the title of the JIRA to better reflect this.

          Show
          Sandy Ryza added a comment - We are implementing the approach outlined by Arun in his first comment on YARN-398 . Although it is not the primary goal, the approach does allow for node/rack blacklisting, and loses nothing by doing so. Even if we were to say that you can't set the disable-allocation flag on node-level requests, it would still be possible to blacklist racks by setting the disable flag on a rack and submitting node requests for nodes under it. It would also still be possible to blacklist nodes by whitelisting every other node on its rack. Allowing the disable-allocation flag on node-level requests just makes the semantics more consistent. I'll update the title of the JIRA to better reflect this.
          Hide
          Sandy Ryza added a comment -

          it would still be possible to blacklist racks by setting the disable flag on a rack and submitting node requests for nodes under it.

          By which I mean: it would still be possible to blacklist racks by setting the disable flag on a rack and submitting *no" node requests for nodes under it.

          Show
          Sandy Ryza added a comment - it would still be possible to blacklist racks by setting the disable flag on a rack and submitting node requests for nodes under it. By which I mean: it would still be possible to blacklist racks by setting the disable flag on a rack and submitting *no" node requests for nodes under it.
          Hide
          Alejandro Abdelnur added a comment -

          latest patch LGTM. Bikas, does Sandy's responses address your concerns? I'd like to get this in so we can move to the next step which is getting this exposed in the client API.

          Show
          Alejandro Abdelnur added a comment - latest patch LGTM. Bikas, does Sandy's responses address your concerns? I'd like to get this in so we can move to the next step which is getting this exposed in the client API.
          Hide
          Bikas Saha added a comment -

          I am not sure how we are making the semantics consistent by overloading something for 2 things. When the flag is set at a network hierarchy level then it means then scheduler will not relax locality beyond that level. The same flag can also be used to blacklist locations.
          The most common use case of black listing is to specify a set of nodes on which no allocations should be made (eg they are badly behaving nodes). How does this scheme address that case? Will we have to specify the same blacklist information for every priority that is used by an application (because resource request is per priority). Every time an app uses a new priority we will have to issue a new set of resource requests to blacklist at that priority?

          Show
          Bikas Saha added a comment - I am not sure how we are making the semantics consistent by overloading something for 2 things. When the flag is set at a network hierarchy level then it means then scheduler will not relax locality beyond that level. The same flag can also be used to blacklist locations. The most common use case of black listing is to specify a set of nodes on which no allocations should be made (eg they are badly behaving nodes). How does this scheme address that case? Will we have to specify the same blacklist information for every priority that is used by an application (because resource request is per priority). Every time an app uses a new priority we will have to issue a new set of resource requests to blacklist at that priority?
          Hide
          Sandy Ryza added a comment -

          As I mentioned, any mechanism that allows whitelisting also allows blacklisting by definition, as it is always possible to whitelist all the nodes but the one that one doesn't want. So I don't see it as overloading.

          The most common use case of black listing is to specify a set of nodes on which no allocations should be made

          I am not suggesting that this blacklisting mechanism is there to address the most common case. In the same way that the most common use of delay scheduling is probably a cluster-wide setting, but allowing customization on specific requests in the way that you suggested earlier on this thread would still be useful, the ability to blacklist nodes for specific requests does not preclude a cluster-wide setting to address the common case.

          Does the following seem like a fair representation of the mechanics of the alternative? When a node-level request comes with disableAllocation=true, an InvalidAllocationException is thrown. When a rack-level request comes with disableAllocation=true, we check to make sure that there are nodes under it. If not, an InvalidAllocationException is thrown. When a node-level request is cancelled, we check the rack above it to make sure that if its disableAllocation=true, there are other non-zero node-level requests below it. If not, we throw an InvalidAllocationException. To me, this seems both more complicated and gives up functionality unnecessarily. That said, if we can get some consensus on an alternative, I am happy to implement that instead.

          Show
          Sandy Ryza added a comment - As I mentioned, any mechanism that allows whitelisting also allows blacklisting by definition, as it is always possible to whitelist all the nodes but the one that one doesn't want. So I don't see it as overloading. The most common use case of black listing is to specify a set of nodes on which no allocations should be made I am not suggesting that this blacklisting mechanism is there to address the most common case. In the same way that the most common use of delay scheduling is probably a cluster-wide setting, but allowing customization on specific requests in the way that you suggested earlier on this thread would still be useful, the ability to blacklist nodes for specific requests does not preclude a cluster-wide setting to address the common case. Does the following seem like a fair representation of the mechanics of the alternative? When a node-level request comes with disableAllocation=true, an InvalidAllocationException is thrown. When a rack-level request comes with disableAllocation=true, we check to make sure that there are nodes under it. If not, an InvalidAllocationException is thrown. When a node-level request is cancelled, we check the rack above it to make sure that if its disableAllocation=true, there are other non-zero node-level requests below it. If not, we throw an InvalidAllocationException. To me, this seems both more complicated and gives up functionality unnecessarily. That said, if we can get some consensus on an alternative, I am happy to implement that instead.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The most common use case of black listing is to specify a set of nodes on which no allocations should be made

          I am not suggesting that this blacklisting mechanism is there to address the most common case. ...

          IIUC, there is no point in supporting black-listing per resource-type. I don't see a use-case for it. When you blacklist a node or a rack, you blacklist it. You don't blacklist it for 5GB,5core containers but want to use it for 1GB/1core container.

          Still catching up the discussion. But wanted to say that this has gone on for too long. We should try and get this into 2.0.5.

          Sandy/Bikas, can we just focus this for 'white-listing- per resource type through the flag that was proposed (and seems to be the consensus earlier) and use YARN-395 for blacklisting. I can close YARN-398 as duplicate.

          Show
          Vinod Kumar Vavilapalli added a comment - The most common use case of black listing is to specify a set of nodes on which no allocations should be made I am not suggesting that this blacklisting mechanism is there to address the most common case. ... IIUC, there is no point in supporting black-listing per resource-type. I don't see a use-case for it. When you blacklist a node or a rack, you blacklist it. You don't blacklist it for 5GB,5core containers but want to use it for 1GB/1core container. Still catching up the discussion. But wanted to say that this has gone on for too long. We should try and get this into 2.0.5. Sandy/Bikas, can we just focus this for 'white-listing- per resource type through the flag that was proposed (and seems to be the consensus earlier) and use YARN-395 for blacklisting. I can close YARN-398 as duplicate.
          Hide
          Sandy Ryza added a comment -

          We currently have a working patch that has gone through multiple phases of review. This patch implements the proposal made by Arun on YARN-398, which many comments led me to believe we had consensus on. The approach enables whitelisting by setting a disable-allocation flag on certain requests, so some form of "blacklisting" is a natural extension of it. The changes to the scheduler are about 10 lines. Modifying the proposal to support only whitelisting would require many additional changes, and do nothing to simplify the current changes.

          As I said, if everyone else participating agrees on these additional changes, I am happy to implement them. But my opinion is that the best way to get this into 2.0.5, both in terms of soundness of the approach and in terms of punctuality, is to go with what we have worked on so far.

          Show
          Sandy Ryza added a comment - We currently have a working patch that has gone through multiple phases of review. This patch implements the proposal made by Arun on YARN-398 , which many comments led me to believe we had consensus on. The approach enables whitelisting by setting a disable-allocation flag on certain requests, so some form of "blacklisting" is a natural extension of it. The changes to the scheduler are about 10 lines. Modifying the proposal to support only whitelisting would require many additional changes, and do nothing to simplify the current changes. As I said, if everyone else participating agrees on these additional changes, I am happy to implement them. But my opinion is that the best way to get this into 2.0.5, both in terms of soundness of the approach and in terms of punctuality, is to go with what we have worked on so far.
          Hide
          Alejandro Abdelnur added a comment -

          [~ bikassaha], as Sandy Ryza points out, blacklisting is a natural consequence of whitelisting and viceversa. The driver of this JIRA is to enable specific node/rack allocation. This is achieved, leveraging the existing protocol/scheduler semantics via a blacklisting of the rack and ALL. As we use the same type of request for node/rack/ALL, doing the blacklisting in the node comes as a freebie, not an intended one. In this case, the AM is the one that decides to do the node blacklisting, it has nothing to do with the health of the node (still, it is the RM's responsibility to take care of this).

          From my end, I don't have an immediate use of app node blacklisting. And Sandy indicates that modifying the scheduler not to handle node blackisting it will require additional changes. If this is a concern, we could do that as a sanity check in the RM when resource requests arrive (if I recall correctly, there is a JIRA for this, I've done a quick search, but I could not find it).

          Show
          Alejandro Abdelnur added a comment - [~ bikassaha] , as Sandy Ryza points out, blacklisting is a natural consequence of whitelisting and viceversa. The driver of this JIRA is to enable specific node/rack allocation. This is achieved, leveraging the existing protocol/scheduler semantics via a blacklisting of the rack and ALL. As we use the same type of request for node/rack/ALL, doing the blacklisting in the node comes as a freebie, not an intended one. In this case, the AM is the one that decides to do the node blacklisting, it has nothing to do with the health of the node (still, it is the RM's responsibility to take care of this). From my end, I don't have an immediate use of app node blacklisting. And Sandy indicates that modifying the scheduler not to handle node blackisting it will require additional changes. If this is a concern, we could do that as a sanity check in the RM when resource requests arrive (if I recall correctly, there is a JIRA for this, I've done a quick search, but I could not find it).
          Hide
          Bikas Saha added a comment -

          Alejandro Abdelnur We were in agreement with the delayed locality relaxation approach where we wanted to add a time interval that specifies how long to wait before doing the relaxation. https://issues.apache.org/jira/browse/YARN-392?focusedCommentId=13639491&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13639491
          At that point there was no ambiguity about the meaning of the new field. It was then suggested that implementing the time interval version of it would be hard and so we decided to do the boolean version of it. Now the boolean has an ambiguity because in a non-intended way it seems like it would support black listing too. Boolean by definition implies an inverse relation.
          Sandy Ryza I meant blacklisting on an app level (and not cluster wide) vs blacklisting at different RRs.

          I am not advocating that using the non-intended aspect of the boolean is not useful. What I dont want is additional ambiguity and overload in the protocol. If we can resolve the ambiguity/overload with a single boolean then it would be great to see a crisp definition of it. If not, and if we want to use a boolean based per RR blacklisting then perhaps we can add a relaxLocality flag which affects only locality relaxation and a blacklist flag that does blacklisting per RR. These may seem duplicate but in the users mind and in the scheduler code their definitions are crisp. The new flag would be a different jira though.

          The way the code is written in the last patch I saw, the only change would be to not check the flag at the node level. As far as the sanity checking is concerned, it would be great to have these and other sanity checks I have already mentioned in previous comments. Sandy responded that those checks would be unnecessary overhead.

          Show
          Bikas Saha added a comment - Alejandro Abdelnur We were in agreement with the delayed locality relaxation approach where we wanted to add a time interval that specifies how long to wait before doing the relaxation. https://issues.apache.org/jira/browse/YARN-392?focusedCommentId=13639491&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13639491 At that point there was no ambiguity about the meaning of the new field. It was then suggested that implementing the time interval version of it would be hard and so we decided to do the boolean version of it. Now the boolean has an ambiguity because in a non-intended way it seems like it would support black listing too. Boolean by definition implies an inverse relation. Sandy Ryza I meant blacklisting on an app level (and not cluster wide) vs blacklisting at different RRs. I am not advocating that using the non-intended aspect of the boolean is not useful. What I dont want is additional ambiguity and overload in the protocol. If we can resolve the ambiguity/overload with a single boolean then it would be great to see a crisp definition of it. If not, and if we want to use a boolean based per RR blacklisting then perhaps we can add a relaxLocality flag which affects only locality relaxation and a blacklist flag that does blacklisting per RR. These may seem duplicate but in the users mind and in the scheduler code their definitions are crisp. The new flag would be a different jira though. The way the code is written in the last patch I saw, the only change would be to not check the flag at the node level. As far as the sanity checking is concerned, it would be great to have these and other sanity checks I have already mentioned in previous comments. Sandy responded that those checks would be unnecessary overhead.
          Hide
          Sandy Ryza added a comment -

          The crisp definition I had in mind was: "if disableAllocation=true for a ResourceRequest, don't use that ResourceRequest by itself for an allocation." The alternative is "if disableAllocation=true for a ResourceRequest at a network hierarchy level, then do not relax locality beyond that level." I think both of these are crisp, and I would be ok with both of them. How about:
          I remove that line in the scheduler that honors the property on node-level requests. We document that the flag is not supported on node-level requests. If we wish to, in a followup JIRA we can decide that setting the flag on a node-level request warrants an exception. Or, in a followup JIRA, we can decide that setting the flag on a node-level request is supported. Neither of these future changes will be a backwards-incompatible change.

          Show
          Sandy Ryza added a comment - The crisp definition I had in mind was: "if disableAllocation=true for a ResourceRequest, don't use that ResourceRequest by itself for an allocation." The alternative is "if disableAllocation=true for a ResourceRequest at a network hierarchy level, then do not relax locality beyond that level." I think both of these are crisp, and I would be ok with both of them. How about: I remove that line in the scheduler that honors the property on node-level requests. We document that the flag is not supported on node-level requests. If we wish to, in a followup JIRA we can decide that setting the flag on a node-level request warrants an exception. Or, in a followup JIRA, we can decide that setting the flag on a node-level request is supported. Neither of these future changes will be a backwards-incompatible change.
          Hide
          Arun C Murthy added a comment -

          The alternative is "if disableAllocation=true for a ResourceRequest at a network hierarchy level, then do not relax locality beyond that level."

          This is exactly what YARN-398 does too.

          I buy Bikas's argument that blacklist should be separate from whitelist and it shouldn't be per ResourceRequest.

          Show
          Arun C Murthy added a comment - The alternative is "if disableAllocation=true for a ResourceRequest at a network hierarchy level, then do not relax locality beyond that level." This is exactly what YARN-398 does too. I buy Bikas's argument that blacklist should be separate from whitelist and it shouldn't be per ResourceRequest.
          Hide
          Sandy Ryza added a comment -

          Uploaded a patch that takes away node-level blacklisting and changes the name of the flag accordingly to "relax locality".

          Show
          Sandy Ryza added a comment - Uploaded a patch that takes away node-level blacklisting and changes the name of the flag accordingly to "relax locality".
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12583887/YARN-392-5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

          org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/960//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/960//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12583887/YARN-392-5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/960//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/960//console This message is automatically generated.
          Hide
          Sandy Ryza added a comment -

          Weird, those tests passed locally for me. Will look into it.

          Show
          Sandy Ryza added a comment - Weird, those tests passed locally for me. Will look into it.
          Hide
          Alejandro Abdelnur added a comment -

          Glad to see we've reached consensus, thanks.

          Show
          Alejandro Abdelnur added a comment - Glad to see we've reached consensus, thanks.
          Hide
          Sandy Ryza added a comment -

          Looks like I hadn't rebased like I thought I had. Uploaded a new patch.

          Show
          Sandy Ryza added a comment - Looks like I hadn't rebased like I thought I had. Uploaded a new patch.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12583920/YARN-392-6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/963//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/963//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12583920/YARN-392-6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/963//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/963//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          LGTM, +1. Bikas Saha, Arun C Murthy or Vinod Kumar Vavilapalli, any concern with the latest patch?

          Show
          Alejandro Abdelnur added a comment - LGTM, +1. Bikas Saha , Arun C Murthy or Vinod Kumar Vavilapalli , any concern with the latest patch?
          Hide
          Bikas Saha added a comment -

          in a followup JIRA we can decide that setting the flag on a node-level request warrants an exception. Or, in a followup JIRA, we can decide that setting the flag on a node-level request is supported. Neither of these future changes will be a backwards-incompatible change.

          Makes sense to me.

          We mean to that level and beyond? e.g. if the flag is set on a rack then it implies locality wont be relaxed for that rack and * right? Thats what the code in the patch does (continue if rack is false). OR does the flag need to be set at rack AND * when asking for specific node?

          +   * For a request at a network hierarchy level, set whether locality can be relaxed
          +   * to that level.
          

          It might help if we write the example in the javadoc by specifying how to use the flag to enable strict node locality. We dont have code checks about what it legal wrt this flag but we should spell them out in the javadoc.

          +   * 
          +   * For example, if the flag is off on a rack-level <code>ResourceRequest</code>,
          +   * containers at that request's priority will not be assigned to nodes on that
          +   * request's rack unless requests specifically for those nodes have also been
          +   * submitted.
          +   * 
          +   * If the flag is off on an any-level <code>ResourceRequest</code>, containers at
          +   * that request's priority will only be assigned on racks for which specific
          +   * requests have also been submitted.
          

          My grey cells are not what they used to be Can you please help me grok the logic following statement?

          +    return anyRequest.getNumContainers() > 0 &&
          +        (nodeRequest == null || nodeRequest.getRelaxLocality()) &&
          +        (anyRequest.getRelaxLocality() ||
          +            (rackRequest != null && rackRequest.getNumContainers() > 0)) &&
          +        (rackRequest == null || rackRequest.getRelaxLocality() ||
          +            (nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
                   Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
          -            request.getCapability(), node.getRMNode().getTotalCapability());
          +            anyRequest.getCapability(), node.getRMNode().getTotalCapability());
          

          You mean node2 right?

          +    // then node1 should get the container
          +    scheduler.handle(node2UpdateEvent);
          +    assertEquals(1, app.getLiveContainers().size());
          
          Show
          Bikas Saha added a comment - in a followup JIRA we can decide that setting the flag on a node-level request warrants an exception. Or, in a followup JIRA, we can decide that setting the flag on a node-level request is supported. Neither of these future changes will be a backwards-incompatible change. Makes sense to me. We mean to that level and beyond? e.g. if the flag is set on a rack then it implies locality wont be relaxed for that rack and * right? Thats what the code in the patch does (continue if rack is false). OR does the flag need to be set at rack AND * when asking for specific node? + * For a request at a network hierarchy level, set whether locality can be relaxed + * to that level. It might help if we write the example in the javadoc by specifying how to use the flag to enable strict node locality. We dont have code checks about what it legal wrt this flag but we should spell them out in the javadoc. + * + * For example, if the flag is off on a rack-level <code>ResourceRequest</code>, + * containers at that request's priority will not be assigned to nodes on that + * request's rack unless requests specifically for those nodes have also been + * submitted. + * + * If the flag is off on an any-level <code>ResourceRequest</code>, containers at + * that request's priority will only be assigned on racks for which specific + * requests have also been submitted. My grey cells are not what they used to be Can you please help me grok the logic following statement? + return anyRequest.getNumContainers() > 0 && + (nodeRequest == null || nodeRequest.getRelaxLocality()) && + (anyRequest.getRelaxLocality() || + (rackRequest != null && rackRequest.getNumContainers() > 0)) && + (rackRequest == null || rackRequest.getRelaxLocality() || + (nodeRequest != null && nodeRequest.getNumContainers() > 0)) && Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null , - request.getCapability(), node.getRMNode().getTotalCapability()); + anyRequest.getCapability(), node.getRMNode().getTotalCapability()); You mean node2 right? + // then node1 should get the container + scheduler.handle(node2UpdateEvent); + assertEquals(1, app.getLiveContainers().size());
          Hide
          Sandy Ryza added a comment -

          We mean to that level and beyond?

          You're right. I'll update the javadoc to better reflect the code.

          It might help if we write the example in the javadoc by specifying how to use the flag to enable strict node locality.

          Agreed.

          My grey cells are not what they used to be Can you please help me grok the logic following statement?

          I'll add in some comments.

          You mean node2 right?

          Right.

          Show
          Sandy Ryza added a comment - We mean to that level and beyond? You're right. I'll update the javadoc to better reflect the code. It might help if we write the example in the javadoc by specifying how to use the flag to enable strict node locality. Agreed. My grey cells are not what they used to be Can you please help me grok the logic following statement? I'll add in some comments. You mean node2 right? Right.
          Hide
          Sandy Ryza added a comment -

          Uploaded a new patch to address Bikas' comments.

          Show
          Sandy Ryza added a comment - Uploaded a new patch to address Bikas' comments.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12584181/YARN-392-7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/978//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/978//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12584181/YARN-392-7.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/978//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/978//console This message is automatically generated.
          Hide
          Bikas Saha added a comment -

          Minor nits
          You would want to put ResourceRequest#ANY to be clear what "any" means in the javadoc.

          Can anyRequest be null. Others are checked for null but this one isnt.
          Typo in comment. Duplicate non-zero.

          +    return
          +        // There must be outstanding requests at the given priority:
          +        anyRequest.getNumContainers() > 0 &&
          +        // If locality relaxation is turned off at *-level, there must be a non-zero
          +        // non-zero request for the node's rack:
          
          Show
          Bikas Saha added a comment - Minor nits You would want to put ResourceRequest#ANY to be clear what "any" means in the javadoc. Can anyRequest be null. Others are checked for null but this one isnt. Typo in comment. Duplicate non-zero. + return + // There must be outstanding requests at the given priority: + anyRequest.getNumContainers() > 0 && + // If locality relaxation is turned off at *-level, there must be a non-zero + // non-zero request for the node's rack:
          Hide
          Bikas Saha added a comment -

          btw, what are the plans for the capacity scheduler?

          Show
          Bikas Saha added a comment - btw, what are the plans for the capacity scheduler?
          Hide
          Sandy Ryza added a comment -

          Can anyRequest be null. Others are checked for null but this one isn't.

          In the contexts that the method is called, it can't be null, but I'll add in a check to be defensive.

          Will fix the comments as well.

          Based on Arun's April 4th comment, my understanding was that capacity scheduler work would be done in YARN-398.

          Show
          Sandy Ryza added a comment - Can anyRequest be null. Others are checked for null but this one isn't. In the contexts that the method is called, it can't be null, but I'll add in a check to be defensive. Will fix the comments as well. Based on Arun's April 4th comment, my understanding was that capacity scheduler work would be done in YARN-398 .
          Hide
          Sandy Ryza added a comment -

          Uploaded a patch that addresses Bikas's latest comments.

          Show
          Sandy Ryza added a comment - Uploaded a patch that addresses Bikas's latest comments.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12584541/YARN-392-8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/993//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/993//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12584541/YARN-392-8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/993//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/993//console This message is automatically generated.
          Hide
          Sandy Ryza added a comment -

          Bikas Saha, thanks for taking the time to iterate on all this. Does the latest patch address all of your concerns?

          Show
          Sandy Ryza added a comment - Bikas Saha , thanks for taking the time to iterate on all this. Does the latest patch address all of your concerns?
          Hide
          Bikas Saha added a comment -

          Sorry got caught up with other stuff. I was waiting for Arun C Murthy to chime in and make sure he is on the same page.

          Show
          Bikas Saha added a comment - Sorry got caught up with other stuff. I was waiting for Arun C Murthy to chime in and make sure he is on the same page.
          Hide
          Alejandro Abdelnur added a comment -

          Bikas Saha, Arun C Murthy, all comments have been addressed, any new one? Else I'd like to commit this to get things moving on the AM client API side.

          Show
          Alejandro Abdelnur added a comment - Bikas Saha , Arun C Murthy , all comments have been addressed, any new one? Else I'd like to commit this to get things moving on the AM client API side.
          Hide
          Arun C Murthy added a comment -

          +1 for the api - I haven't reviewed FS changes though, thanks Sandy.

          Show
          Arun C Murthy added a comment - +1 for the api - I haven't reviewed FS changes though, thanks Sandy.
          Hide
          Alejandro Abdelnur added a comment -

          Thanks Sandy. Thanks Bikas, Vinod & Arun for looking at it. Committed to trunk and branch-2.

          Show
          Alejandro Abdelnur added a comment - Thanks Sandy. Thanks Bikas, Vinod & Arun for looking at it. Committed to trunk and branch-2.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3821 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3821/)
          YARN-392. Make it possible to specify hard locality constraints in resource requests. (sandyr via tucu) (Revision 1488326)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1488326
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3821 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3821/ ) YARN-392 . Make it possible to specify hard locality constraints in resource requests. (sandyr via tucu) (Revision 1488326) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1488326 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #227 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/227/)
          YARN-392. Make it possible to specify hard locality constraints in resource requests. (sandyr via tucu) (Revision 1488326)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1488326
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #227 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/227/ ) YARN-392 . Make it possible to specify hard locality constraints in resource requests. (sandyr via tucu) (Revision 1488326) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1488326 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1417 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1417/)
          YARN-392. Make it possible to specify hard locality constraints in resource requests. (sandyr via tucu) (Revision 1488326)

          Result = FAILURE
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1488326
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1417 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1417/ ) YARN-392 . Make it possible to specify hard locality constraints in resource requests. (sandyr via tucu) (Revision 1488326) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1488326 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1443 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1443/)
          YARN-392. Make it possible to specify hard locality constraints in resource requests. (sandyr via tucu) (Revision 1488326)

          Result = SUCCESS
          tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1488326
          Files :

          • /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
          • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1443 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1443/ ) YARN-392 . Make it possible to specify hard locality constraints in resource requests. (sandyr via tucu) (Revision 1488326) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1488326 Files : /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java

            People

            • Assignee:
              Sandy Ryza
              Reporter:
              Bikas Saha
            • Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development