Details

    • Hadoop Flags:
      Reviewed

      Description

      While preempting containers based on the queue ideal assignment, we may need to consider preempting the low priority application containers first.

      1. YARN-2009.0001.patch
        112 kB
        Sunil G
      2. YARN-2009.0002.patch
        117 kB
        Sunil G
      3. YARN-2009.0003.patch
        122 kB
        Sunil G
      4. YARN-2009.0004.patch
        124 kB
        Sunil G
      5. YARN-2009.0005.patch
        127 kB
        Sunil G
      6. YARN-2009.0006.patch
        133 kB
        Sunil G
      7. YARN-2009.0007.patch
        133 kB
        Sunil G
      8. YARN-2009.0008.patch
        142 kB
        Sunil G
      9. YARN-2009.0009.patch
        146 kB
        Sunil G
      10. YARN-2009.0010.patch
        145 kB
        Sunil G
      11. YARN-2009.0011.patch
        144 kB
        Sunil G
      12. YARN-2009.0012.patch
        146 kB
        Sunil G
      13. YARN-2009.0013.patch
        147 kB
        Sunil G
      14. YARN-2009.0014.patch
        147 kB
        Sunil G
      15. YARN-2009.0015.patch
        147 kB
        Sunil G
      16. YARN-2009.0016.patch
        156 kB
        Sunil G
      17. YARN-2009.branch-2.8.001.patch
        153 kB
        Eric Payne
      18. YARN-2009.branch-2.8.002.patch
        154 kB
        Eric Payne

        Issue Links

          Activity

          Hide
          curino Carlo Curino added a comment -

          If I am not mistaken this is what is happening...

          The policy picks from the queue in reverse order (i.e., least priority first, and for now since this is FIFO, means youngest App is picked as a first victim),
          next it tries to unreserve containers (as this is a free metadata-only operation), and then picks containers in reverse priority order.
          This is in getContainersToPreempt(..).

          Is this what you meant? Am I missing something?

          Show
          curino Carlo Curino added a comment - If I am not mistaken this is what is happening... The policy picks from the queue in reverse order (i.e., least priority first, and for now since this is FIFO, means youngest App is picked as a first victim), next it tries to unreserve containers (as this is a free metadata-only operation), and then picks containers in reverse priority order. This is in getContainersToPreempt(..). Is this what you meant? Am I missing something?
          Hide
          sunilg Sunil G added a comment -

          As per YARN-1963, jobs can be submitted in a queue with priority.
          In that scenario, when a queue is identified to preempt few containers among its applications, job priority also can be considered.

          Show
          sunilg Sunil G added a comment - As per YARN-1963 , jobs can be submitted in a queue with priority. In that scenario, when a queue is identified to preempt few containers among its applications, job priority also can be considered.
          Hide
          eepayne Eric Payne added a comment -

          Hi Sunil G. If I understand correctly, this Jira is not intended to implement preemption within a single queue. Rather, the intent is to change ProportionalCapacityPreemptionPolicy so that when selecting containers to preempt from a queue, choose the lower priority ones first. Is that correct?

          Show
          eepayne Eric Payne added a comment - Hi Sunil G . If I understand correctly, this Jira is not intended to implement preemption within a single queue. Rather, the intent is to change ProportionalCapacityPreemptionPolicy so that when selecting containers to preempt from a queue, choose the lower priority ones first. Is that correct?
          Hide
          sunilg Sunil G added a comment -

          Yes. Idea is more or less the same.

          We had a prototype done on this and along with ApplicationPrioirty, this can be brought as separate policy.
          However points to discuss are

          • Container has to be selected from lower priority applications based on node locality constraint from higher priority application
          • Co-existing this logic with node-labels
          • Am container has to be spared etc.

          Pls share your thoughts.

          Show
          sunilg Sunil G added a comment - Yes. Idea is more or less the same. We had a prototype done on this and along with ApplicationPrioirty, this can be brought as separate policy. However points to discuss are Container has to be selected from lower priority applications based on node locality constraint from higher priority application Co-existing this logic with node-labels Am container has to be spared etc. Pls share your thoughts.
          Hide
          curino Carlo Curino added a comment -

          Sunil,

          A thing to keep in mind is that any preemption action you initiate as a significant delay, i.e., we will see effects only a while after, likely under somewhat changed cluster conditions and app needs, etc..

          For this reason we decided to maximize the flexibility of the application being preempted (allowing for late bind on which containers to yield back), instead of constraining the requests with strict locality preferences.
          Intuition being that we have a better chance to be efficient on the preempted side than on the preempting side (already running tasks and immediate impact, vs hypothesis on task locality for future running containers).

          I don't have any strong evidence to back those intuitions (which are likely to hold for some workload but probably not all), but I suggest you to consider this concerns, and maybe devise some good experiments to
          test whether the locality-centric preemption gets you the benefit you hope for (it is otherwise unnecessary complication, that has hard to understand interactions with fairness/priorities etc...).

          Similar thoughts apply to node labels, however I believe in this context the needs are likely to be more "stable" over time, so maybe preempted in a label-aware manner might be good.

          my 2 cents,
          Carlo

          Show
          curino Carlo Curino added a comment - Sunil, A thing to keep in mind is that any preemption action you initiate as a significant delay, i.e., we will see effects only a while after, likely under somewhat changed cluster conditions and app needs, etc.. For this reason we decided to maximize the flexibility of the application being preempted (allowing for late bind on which containers to yield back), instead of constraining the requests with strict locality preferences. Intuition being that we have a better chance to be efficient on the preempted side than on the preempting side (already running tasks and immediate impact, vs hypothesis on task locality for future running containers). I don't have any strong evidence to back those intuitions (which are likely to hold for some workload but probably not all), but I suggest you to consider this concerns, and maybe devise some good experiments to test whether the locality-centric preemption gets you the benefit you hope for (it is otherwise unnecessary complication, that has hard to understand interactions with fairness/priorities etc...). Similar thoughts apply to node labels, however I believe in this context the needs are likely to be more "stable" over time, so maybe preempted in a label-aware manner might be good. my 2 cents, Carlo
          Hide
          sunilg Sunil G added a comment -

          I agree with your thoughts Carlo Curino. Locality constraints based policy making is as you told hypothetical, and with a given set of test experiments we can see how far its adding value .. I am devising and working on some useful tests to see the advantage. However I also felt that this added thought may help cluster to work in better way. But now it seems more complicated as weightage of choosing which container, is not balanced or straight forward while considering all scenarios.

          Scenario:
          a. Higher priority application needs 7 containers
          b. 2 apps in Lower priority has 4 containers(2 each), and 2 apps at Very low priority has 4 containers (2 each).

          Possible behavior from preemption policy can be:
          1. Spare AM containers (Based on config)
          2. At Very Low priority, choose application which is last submitted and claim 2 containers. Then the next app at same level.

          This may be the direct output we expect.

          However, few thoughts
          1. higher priority app may need containers on certain nodes(locality), but the preemption happened on other nodes, and thus make a choice of rack local or even any.
          2. With node labels, its even possible that the preempted containers fall into another set of label on which the demand can't be supplied.
          3. User limit factor has to be respected during preemption (queue preemption considers this already with a config)
          4. A different example, higher priority application needs 2 container of 6GB each. 1 lower priority application has 12 containers of 1Gb each, another lower priority has 2 container of 6Gb each. With submission time, if we choose 1st lower priority app, we may kill more containers. Sometimes a wiser choice is to select 2nd one. This is debatable
          5. Taking first example itself. we have 2 lower priority apps to choose from, but based on submission time 1st app is selected for preemption. Its possible that this app may be more i/o bounded and finished more % of work than 2nd one which is submitted earlier. So submission time alone may not be a good choice, % of job completion can be considered.

          My point is being there is no single line in which a decision can be made sequentially, few customers may be option 2 over option1. Hence a policy with lot of config may come in if we accept these as feature. I would like to get your thoughts on this, as you told this may not give a big output, but only can work as enhancement.

          Show
          sunilg Sunil G added a comment - I agree with your thoughts Carlo Curino . Locality constraints based policy making is as you told hypothetical, and with a given set of test experiments we can see how far its adding value .. I am devising and working on some useful tests to see the advantage. However I also felt that this added thought may help cluster to work in better way. But now it seems more complicated as weightage of choosing which container, is not balanced or straight forward while considering all scenarios. Scenario: a. Higher priority application needs 7 containers b. 2 apps in Lower priority has 4 containers(2 each), and 2 apps at Very low priority has 4 containers (2 each). Possible behavior from preemption policy can be: 1. Spare AM containers (Based on config) 2. At Very Low priority, choose application which is last submitted and claim 2 containers. Then the next app at same level. This may be the direct output we expect. However, few thoughts 1. higher priority app may need containers on certain nodes(locality), but the preemption happened on other nodes, and thus make a choice of rack local or even any. 2. With node labels, its even possible that the preempted containers fall into another set of label on which the demand can't be supplied. 3. User limit factor has to be respected during preemption (queue preemption considers this already with a config) 4. A different example, higher priority application needs 2 container of 6GB each. 1 lower priority application has 12 containers of 1Gb each, another lower priority has 2 container of 6Gb each. With submission time, if we choose 1st lower priority app, we may kill more containers. Sometimes a wiser choice is to select 2nd one. This is debatable 5. Taking first example itself. we have 2 lower priority apps to choose from, but based on submission time 1st app is selected for preemption. Its possible that this app may be more i/o bounded and finished more % of work than 2nd one which is submitted earlier. So submission time alone may not be a good choice, % of job completion can be considered. My point is being there is no single line in which a decision can be made sequentially, few customers may be option 2 over option1. Hence a policy with lot of config may come in if we accept these as feature. I would like to get your thoughts on this, as you told this may not give a big output, but only can work as enhancement.
          Hide
          curino Carlo Curino added a comment -

          I agree with your observation... the set of invariants/semantics (queue capacity, max-capacity, user quotas, apps priority, max-am-percentage, container size and multi-resources, etc..) cross product with preferences/optimizations (spare AMs, node labels, locality, minimize latency of jobs, etc..) makes for a vast space of possible policies... Considering the hierarchical nature of queues correctly in this makes for an even worse space.

          Beside the challenge of writing uber-policies that can handle all that, it is very hard to tune right. Even just the simplistic preemption we have today is confusing even very competent users (I know for a fact). I worry that more and more complexity will get to be unmanageable by most. In a sense I am growing fond of a notion of "explainability" of a system behavior, which favor systems that one can easily understand/predict the behavior of (to the cost of some optimality).

          To this purpose our cut-point in the early design of preemption was to say: "preemption should only kick in to correct large imbalances, and operate on a rather slow time-scale". The idea was to for example consider that if I am preempting 1k containers for you to get your capacity, locality would matter less... and so would many minor other issues like local priorities, locality, container sizes etc..

          Overall, I think we should be use-case driven. If there is a clear "need" for complexity to cope with observed issues I think we can add it, but I would suggest we refrain from adding too many knobs based on hypothetical scenarios. If a need is not present yet, I would propose to require a "sizeable win" as a bar for adding knobs.. if we can demonstrate on some non-trivial experimental setup that a knob can deliver substantial value than maybe it's ok.

          Show
          curino Carlo Curino added a comment - I agree with your observation... the set of invariants/semantics (queue capacity, max-capacity, user quotas, apps priority, max-am-percentage, container size and multi-resources, etc..) cross product with preferences/optimizations (spare AMs, node labels, locality, minimize latency of jobs, etc..) makes for a vast space of possible policies... Considering the hierarchical nature of queues correctly in this makes for an even worse space. Beside the challenge of writing uber-policies that can handle all that, it is very hard to tune right. Even just the simplistic preemption we have today is confusing even very competent users (I know for a fact). I worry that more and more complexity will get to be unmanageable by most. In a sense I am growing fond of a notion of "explainability" of a system behavior, which favor systems that one can easily understand/predict the behavior of (to the cost of some optimality). To this purpose our cut-point in the early design of preemption was to say: "preemption should only kick in to correct large imbalances, and operate on a rather slow time-scale". The idea was to for example consider that if I am preempting 1k containers for you to get your capacity, locality would matter less... and so would many minor other issues like local priorities, locality, container sizes etc.. Overall, I think we should be use-case driven. If there is a clear "need" for complexity to cope with observed issues I think we can add it, but I would suggest we refrain from adding too many knobs based on hypothetical scenarios. If a need is not present yet, I would propose to require a "sizeable win" as a bar for adding knobs.. if we can demonstrate on some non-trivial experimental setup that a knob can deliver substantial value than maybe it's ok.
          Hide
          eepayne Eric Payne added a comment -

          If we are to choose the less complicated route, I believe that, at the very least, when ProportionalCapacityPreemptionPolicy determines that queueA needs to give up some containers, it should first select containers from the lowest priority apps.

          Show
          eepayne Eric Payne added a comment - If we are to choose the less complicated route, I believe that, at the very least, when ProportionalCapacityPreemptionPolicy determines that queueA needs to give up some containers, it should first select containers from the lowest priority apps.
          Hide
          sunilg Sunil G added a comment -

          Thank you Carlo Curino for the thoughts.
          I understand the complexity for the user when they experience preemption of few containers and that itself may be tougher for them to understand why that container is preempted and reasons for it. In a simple way, if timestamp is only considered (to an extent user-limit factor also now), that itself will be tougher to express through logs.

          Hence solving some of these small imbalances like what I mentioned may not help much the users in a big level. Based on use cases we can check whether these are needed later. Coming to the focus of this JIRA, within a queue if slow and low priority applications are running and consuming full resources, it will be good if we can make some space by preempting lower priority ones. This preemption can be done within a Queue. we have seen some lower priority applications are taking more cluster and higher priority applications are waiting to launch for long time. Please suggest your thoughts on this.

          Show
          sunilg Sunil G added a comment - Thank you Carlo Curino for the thoughts. I understand the complexity for the user when they experience preemption of few containers and that itself may be tougher for them to understand why that container is preempted and reasons for it. In a simple way, if timestamp is only considered (to an extent user-limit factor also now), that itself will be tougher to express through logs. Hence solving some of these small imbalances like what I mentioned may not help much the users in a big level. Based on use cases we can check whether these are needed later. Coming to the focus of this JIRA, within a queue if slow and low priority applications are running and consuming full resources, it will be good if we can make some space by preempting lower priority ones. This preemption can be done within a Queue. we have seen some lower priority applications are taking more cluster and higher priority applications are waiting to launch for long time. Please suggest your thoughts on this.
          Hide
          curino Carlo Curino added a comment -

          Sunil G, pardon the long delay... From what you say it seems like the priority issues within queue it is important for you and you observe non-trivial delays.
          If that's is the case, I think it is fine to venture in adding within-queue cross-app preemption. I would argue in favor of a conservative policy with several built-in
          dampers (like max per-round preemptions, deadzones, fraction of imbalance) like we did for cross-queue. Also we should be careful not to make it too expensive
          (if we have thousands of apps in the queues, we should be mindful of not overloading the RM with costly rebalancing algos, and extra scheduling decisions that
          derived from preemption).

          What Eric Payne says also makes sense, if we preemptions triggered by cross-queue imbalances it would be good to "spend" them to correct the issue you observed.

          Show
          curino Carlo Curino added a comment - Sunil G , pardon the long delay... From what you say it seems like the priority issues within queue it is important for you and you observe non-trivial delays. If that's is the case, I think it is fine to venture in adding within-queue cross-app preemption. I would argue in favor of a conservative policy with several built-in dampers (like max per-round preemptions, deadzones, fraction of imbalance) like we did for cross-queue. Also we should be careful not to make it too expensive (if we have thousands of apps in the queues, we should be mindful of not overloading the RM with costly rebalancing algos, and extra scheduling decisions that derived from preemption). What Eric Payne says also makes sense, if we preemptions triggered by cross-queue imbalances it would be good to "spend" them to correct the issue you observed.
          Hide
          eepayne Eric Payne added a comment -

          Sunil G, pardon the long delay... From what you say it seems like the priority issues within queue it is important for you and you observe non-trivial delays.
          If that's is the case, I think it is fine to venture in adding within-queue cross-app preemption.
          ...
          if preemptions triggered by cross-queue imbalances it would be good to "spend" them to correct the issue you observed.

          Thanks a lot Carlo Curino and Sunil G. Good discussion.

          I feel that we should separate out these two concepts into separate JIRAs:

          • In-queue container preemption based on app priority
          • cross queue preemption that takes into consideration priority of the container(s) being preemption compared among all containers in the preempted queue.

          They have their own set of challenges, and each is significantly complicated that it seems like a logical separation.

          Regarding unnecessary preemption because of locality, labels, etc., I think the preemption monitor now considers labels when determining which containers to preempt, so that should be extendable for priority, but locality is still an issue.

          And, of course, all of this will probably have to be re-evaluated in light of HDFS-4108.

          Show
          eepayne Eric Payne added a comment - Sunil G, pardon the long delay... From what you say it seems like the priority issues within queue it is important for you and you observe non-trivial delays. If that's is the case, I think it is fine to venture in adding within-queue cross-app preemption. ... if preemptions triggered by cross-queue imbalances it would be good to "spend" them to correct the issue you observed. Thanks a lot Carlo Curino and Sunil G . Good discussion. I feel that we should separate out these two concepts into separate JIRAs: In-queue container preemption based on app priority cross queue preemption that takes into consideration priority of the container(s) being preemption compared among all containers in the preempted queue. They have their own set of challenges, and each is significantly complicated that it seems like a logical separation. Regarding unnecessary preemption because of locality, labels, etc., I think the preemption monitor now considers labels when determining which containers to preempt, so that should be extendable for priority, but locality is still an issue. And, of course, all of this will probably have to be re-evaluated in light of HDFS-4108 .
          Hide
          eepayne Eric Payne added a comment -

          And, of course, all of this will probably have to be re-evaluated in light of HDFS-4108.

          Sorry. Should have been YARN-4108

          Show
          eepayne Eric Payne added a comment - And, of course, all of this will probably have to be re-evaluated in light of HDFS-4108 . Sorry. Should have been YARN-4108
          Hide
          sunilg Sunil G added a comment -

          Thanks Eric Payne for sharing the thoughts.

          I feel that we should separate out these two concepts into separate JIRA

          Yes, these 2 cases will be handled separately. And yes, we need to see the progress in YARN-4108 which tries to solve many of these issues. And I think a great deal of locality also can be tackled in that.
          So I think its better to go this ticket along with YARN-4108, correct. Because we can leverage the advantages as you mentioned.

          Show
          sunilg Sunil G added a comment - Thanks Eric Payne for sharing the thoughts. I feel that we should separate out these two concepts into separate JIRA Yes, these 2 cases will be handled separately. And yes, we need to see the progress in YARN-4108 which tries to solve many of these issues. And I think a great deal of locality also can be tackled in that. So I think its better to go this ticket along with YARN-4108 , correct. Because we can leverage the advantages as you mentioned.
          Hide
          eepayne Eric Payne added a comment -

          Thanks Sunil G. One more note is that YARN-2113 was also raised to handle in-queue preemption, but not specifically for purposes of job-priority. I think it was more to address fairness in the per-user headroom.

          Show
          eepayne Eric Payne added a comment - Thanks Sunil G . One more note is that YARN-2113 was also raised to handle in-queue preemption, but not specifically for purposes of job-priority. I think it was more to address fairness in the per-user headroom.
          Hide
          sunilg Sunil G added a comment -

          Hi Eric Payne
          Thank you. I guess its more for user fairness based preemption so that if some resource is grabbed by a user, its to be reclaimed back. Priority is another policy to add in inqueue preemption.
          I think these policies can be pluggable, and with same framework user can apply these triggers to achieve use cases. I think along with YARN-4108, inqueue preemption framework can be added first. And separate triggers (priority, user-limit) can be added as pluggable entities. thoughts?

          Show
          sunilg Sunil G added a comment - Hi Eric Payne Thank you. I guess its more for user fairness based preemption so that if some resource is grabbed by a user, its to be reclaimed back. Priority is another policy to add in inqueue preemption. I think these policies can be pluggable, and with same framework user can apply these triggers to achieve use cases. I think along with YARN-4108 , inqueue preemption framework can be added first. And separate triggers (priority, user-limit) can be added as pluggable entities. thoughts?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Sunil G/Eric Payne.

          I was thinking YARN-4108 could resolve it. But I found the original plan (see design doc v2) is not a doable plan in a short term. I uploaded v3 only handles part of the original issue (avoid unnecessary preemption happens), and leaves how to select to-be preempted containers as-is.

          If you have time, could you take a look at latest proposal of YARN-4108. I think how to choose to-be preempted containers should be handled inside PCPP by different policies. We've put too many logics (like sorted by FIFO order, not preempt AM container, etc) to PCPP, maybe it's time to cleanup them to a better form.

          Show
          leftnoteasy Wangda Tan added a comment - Hi Sunil G / Eric Payne . I was thinking YARN-4108 could resolve it. But I found the original plan (see design doc v2) is not a doable plan in a short term. I uploaded v3 only handles part of the original issue (avoid unnecessary preemption happens), and leaves how to select to-be preempted containers as-is. If you have time, could you take a look at latest proposal of YARN-4108 . I think how to choose to-be preempted containers should be handled inside PCPP by different policies. We've put too many logics (like sorted by FIFO order, not preempt AM container, etc) to PCPP, maybe it's time to cleanup them to a better form.
          Hide
          sunilg Sunil G added a comment -

          Thanks Wangda Tan for sharing the updates. I will also take a look on YARN-4108.

          I think how to choose to-be preempted containers should be handled inside PCPP by different policies

          At first step, this will be really helpful and it will make code more easier and understandable. For in queue preemption, I will work on a similar approach what is taken in PCPP and will post a doc here. I think it can be another policy which is independent of PCPP, but could share few common code as possible.

          Show
          sunilg Sunil G added a comment - Thanks Wangda Tan for sharing the updates. I will also take a look on YARN-4108 . I think how to choose to-be preempted containers should be handled inside PCPP by different policies At first step, this will be really helpful and it will make code more easier and understandable. For in queue preemption, I will work on a similar approach what is taken in PCPP and will post a doc here. I think it can be another policy which is independent of PCPP, but could share few common code as possible.
          Hide
          sunilg Sunil G added a comment -

          Attaching new patch with few more UT cases, I will add some more cases in next version of patch.

          cc/Wangda Tan and Eric Payne

          Show
          sunilg Sunil G added a comment - Attaching new patch with few more UT cases, I will add some more cases in next version of patch. cc/ Wangda Tan and Eric Payne
          Hide
          eepayne Eric Payne added a comment -

          Sunil G, Thanks for providing YARN-2009.0001.patch.

          Unfortunately, FiCaSchedulerApp.java didn't apply cleanly to the latest trunk.

          Also, I get compilation errors. Still investigating:

          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hadoop-yarn-server-resourcemanager: Compilation failure: Compilation failure:
          [ERROR] /hadoop/source/YARN-4945/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoIntraQueuePreemptionPolicy.java:[249,14] cannot find symbol
          [ERROR] symbol:   method getTotalPendingRequests()
          [ERROR] location: variable app of type org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
          [ERROR] /hadoop/source/YARN-4945/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoIntraQueuePreemptionPolicy.java:[258,14] cannot find symbol
          [ERROR] symbol:   method getTotalPendingRequests()
          [ERROR] location: variable app of type org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
          
          Show
          eepayne Eric Payne added a comment - Sunil G , Thanks for providing YARN-2009 .0001.patch. Unfortunately, FiCaSchedulerApp.java didn't apply cleanly to the latest trunk. Also, I get compilation errors. Still investigating: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hadoop-yarn-server-resourcemanager: Compilation failure: Compilation failure: [ERROR] /hadoop/source/YARN-4945/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoIntraQueuePreemptionPolicy.java:[249,14] cannot find symbol [ERROR] symbol: method getTotalPendingRequests() [ERROR] location: variable app of type org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp [ERROR] /hadoop/source/YARN-4945/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoIntraQueuePreemptionPolicy.java:[258,14] cannot find symbol [ERROR] symbol: method getTotalPendingRequests() [ERROR] location: variable app of type org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp
          Hide
          sunilg Sunil G added a comment -

          Ah. I think my branch needs a rebase. Lemme get into that.

          Show
          sunilg Sunil G added a comment - Ah. I think my branch needs a rebase. Lemme get into that.
          Hide
          eepayne Eric Payne added a comment -

          Sunil G, please note that your patch depends on FiCaSchedulerApp#getTotalPendingRequests, but that was removed today by YARN-3141. CC-ing Wangda Tan

          Show
          eepayne Eric Payne added a comment - Sunil G , please note that your patch depends on FiCaSchedulerApp#getTotalPendingRequests , but that was removed today by YARN-3141 . CC-ing Wangda Tan
          Hide
          leftnoteasy Wangda Tan added a comment -

          Eric Payne, Oh it was a method that nobody uses.

          Sunil G, could you look again to see if the getTotalPendingRequests is required, I think it only need per-partition pending resources for an app.

          Show
          leftnoteasy Wangda Tan added a comment - Eric Payne , Oh it was a method that nobody uses. Sunil G , could you look again to see if the getTotalPendingRequests is required, I think it only need per-partition pending resources for an app.
          Hide
          sunilg Sunil G added a comment -

          Yes. I could avoid using that api. Will update a new patch shortly.

          Show
          sunilg Sunil G added a comment - Yes. I could avoid using that api. Will update a new patch shortly.
          Hide
          eepayne Eric Payne added a comment -

          Sunil G / Wangda Tan
          I am still in the middle of reviewing the patch, but I have a couple of overall concerns about the design of FifoIntraQueuePreemptionPolicy#computeAppsIdealAllocation

          • If we will be combining FIFO priority and FIFO MULP preemption, then I don't think idealAssigned can be calculated independently from each other:
            • I think that all apps in a queue should be grouped according to user Map<user, applications>
            • I think there should be a separate TAMinUserLimitPctComparator that calculates underserved users based on min user limit percent.
              • Comparator would try to balance MULP across all users like the Capacity Scheduler does
            • I think TAPriorityComparator should then only be given apps from the same user.
          • Once we have idalAssigned per user, then we can divide that up among apps belonging to that user.
          Show
          eepayne Eric Payne added a comment - Sunil G / Wangda Tan I am still in the middle of reviewing the patch, but I have a couple of overall concerns about the design of FifoIntraQueuePreemptionPolicy#computeAppsIdealAllocation If we will be combining FIFO priority and FIFO MULP preemption, then I don't think idealAssigned can be calculated independently from each other: I think that all apps in a queue should be grouped according to user Map<user, applications> I think there should be a separate TAMinUserLimitPctComparator that calculates underserved users based on min user limit percent. Comparator would try to balance MULP across all users like the Capacity Scheduler does I think TAPriorityComparator should then only be given apps from the same user. Once we have idalAssigned per user, then we can divide that up among apps belonging to that user.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Eric Payne,

          ... then I don't think idealAssigned can be calculated independently from each other ...

          Actually I was thinking the same thing which we should compute idealAssigned for each user when I was reviewing YARN-2069. But I realized we may not need, let me explain it a little bit:
          Computed user-limit resource in existing CS is using as higher bound of how much that each user should get, there's no "lower bound user limit resource" in reality.

          I think all of us agree that behavior of preemption should be consistent with behavior of scheduling, any mismatch between the two could lead to excessive preemption.

          When FIFO (and also FIFO + PRIORITY) policy is enabled, an example of existing CS's behavior is:

          Queue's user-limit-percent = 33
          Queue's used=guaranteed=max=12. 
          There're 3 users (A,B,C) in the queue, order of applications are A/B/C
          Applications from user-A/C are asking for more resource, and application of userB is satisfied already.
          
          So the computed user-limit-resource will be 6.
          
          Assume resource usages of A/B/C are 5/6/1, and A/C have 1 pending resource, 
          
          The actual user-ideal-assignment when doing scheduling is 6/6/0 !
          (A can get the 1 additional resource, and B will not changed, C can get nothing after that)
          

          So in another words, user-limit is just a cap in additional to FIFO (or FIFO+Priority) order:

          Back to the preemption patch, the pseudo code to compute application ideal allocation consider user limit will be:

          void compute-ideal-allocation-for-apps(List<Application> apps) {
              user-limit-resource = queue.get-user-limit-resource();
          
              // initial all value to 0
              Map<String, Resource> user-to-allocated;
          
              for app in sort-by-fifo-or-priority(apps) {
                 if (user-to-allocated.get(app.user) < user-limit-resource) {
                      app.allocated = min(app.used + pending, user-limit-resource - user-to-allocated.get(app.user));
                      user-to-allocated.get(app.user) += app.allocated;
                 } else {
                       // skip this app because user-limit reached
                 }
              }
          }
          

          Please let me know about your thoughts.

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - Eric Payne , ... then I don't think idealAssigned can be calculated independently from each other ... Actually I was thinking the same thing which we should compute idealAssigned for each user when I was reviewing YARN-2069 . But I realized we may not need, let me explain it a little bit: Computed user-limit resource in existing CS is using as higher bound of how much that each user should get, there's no "lower bound user limit resource" in reality. I think all of us agree that behavior of preemption should be consistent with behavior of scheduling, any mismatch between the two could lead to excessive preemption. When FIFO (and also FIFO + PRIORITY) policy is enabled, an example of existing CS's behavior is: Queue's user-limit-percent = 33 Queue's used=guaranteed=max=12. There're 3 users (A,B,C) in the queue, order of applications are A/B/C Applications from user-A/C are asking for more resource, and application of userB is satisfied already. So the computed user-limit-resource will be 6. Assume resource usages of A/B/C are 5/6/1, and A/C have 1 pending resource, The actual user-ideal-assignment when doing scheduling is 6/6/0 ! (A can get the 1 additional resource, and B will not changed, C can get nothing after that) So in another words, user-limit is just a cap in additional to FIFO (or FIFO+Priority) order: Back to the preemption patch, the pseudo code to compute application ideal allocation consider user limit will be: void compute-ideal-allocation- for -apps(List<Application> apps) { user-limit-resource = queue.get-user-limit-resource(); // initial all value to 0 Map< String , Resource> user-to-allocated; for app in sort-by-fifo-or-priority(apps) { if (user-to-allocated.get(app.user) < user-limit-resource) { app.allocated = min(app.used + pending, user-limit-resource - user-to-allocated.get(app.user)); user-to-allocated.get(app.user) += app.allocated; } else { // skip this app because user-limit reached } } } Please let me know about your thoughts. Thanks,
          Hide
          leftnoteasy Wangda Tan added a comment -

          And one additional note about why balancing user-usage only in preemption logic will cause excessive preemption.

          Continue above 3-users example.

          If the preemption policy compute and set 4 (4 = 12 / 3) for ideal allocation of user A/B/C. So it may preempt some resource from A/B to make room for C. But when scheduler doing allocation. because A/B sit in front of the queue, resource will come back to A/B again.

          Show
          leftnoteasy Wangda Tan added a comment - And one additional note about why balancing user-usage only in preemption logic will cause excessive preemption. Continue above 3-users example. If the preemption policy compute and set 4 (4 = 12 / 3) for ideal allocation of user A/B/C. So it may preempt some resource from A/B to make room for C. But when scheduler doing allocation. because A/B sit in front of the queue, resource will come back to A/B again.
          Hide
          sunilg Sunil G added a comment -

          Thanks Eric Payne and Wangda Tan. Attaching new patch.

          Regarding user-limit discussion, i think generally approach looks fine. I feel we can also add dead-zone around user-limit. This can help to avoid thrashing scenarios. (we do need for priority as its kind of direct calculation).

          I am also thinking case when we ll have a complete preemption use case like few containers coming from inter queue, and few others from user-limit, and finally some more containers coming from priority based policy.

          I think we might need to improve preemption metrics here. I would like to collect and design preemption metrics module so that we can get clear information about which module perform how much preemption. Thoughts?

          Show
          sunilg Sunil G added a comment - Thanks Eric Payne and Wangda Tan . Attaching new patch. Regarding user-limit discussion, i think generally approach looks fine. I feel we can also add dead-zone around user-limit. This can help to avoid thrashing scenarios. (we do need for priority as its kind of direct calculation). I am also thinking case when we ll have a complete preemption use case like few containers coming from inter queue, and few others from user-limit, and finally some more containers coming from priority based policy. I think we might need to improve preemption metrics here. I would like to collect and design preemption metrics module so that we can get clear information about which module perform how much preemption. Thoughts?
          Hide
          eepayne Eric Payne added a comment -

          Wangda Tan, I am confused by your above example:

          Queue's user-limit-percent = 33
          Queue's used=guaranteed=max=12.
          There're 3 users (A,B,C) in the queue, order of applications are A/B/C
          ...
          So the computed user-limit-resource will be 6.
          ...
          The actual user-ideal-assignment when doing scheduling is 6/6/0 !

          If minimum-user-limit-percent == 33, why is the user-limit-resource == 6?

          Shouldn't {idealAssigned}} be 4/4/4? not 6/6/0?

          Show
          eepayne Eric Payne added a comment - Wangda Tan , I am confused by your above example: Queue's user-limit-percent = 33 Queue's used=guaranteed=max=12. There're 3 users (A,B,C) in the queue, order of applications are A/B/C ... So the computed user-limit-resource will be 6. ... The actual user-ideal-assignment when doing scheduling is 6/6/0 ! If minimum-user-limit-percent == 33 , why is the user-limit-resource == 6 ? Shouldn't {idealAssigned}} be 4/4/4? not 6/6/0?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Eric Payne,

          Existing user-limit is computed by following logic:

          user_limit = min { 
                              max{ current_capacity / #active_users, current_capacity * user_limit_percent},
                              queue_capacity * user_limit_factor}
                             }
          

          In my above example, the #active_users is 2 instead of 3 (because B has no more pending resource). The reason why it uses #active-user is: existing user-limit is used to balance available resource to active users, it doesn't consider the needs to re-balance (via preemption) usages of users. To make intra-queue user limit preemption can correctly balance usages between users, we need to fix the scheduling logic as well.

          Show
          leftnoteasy Wangda Tan added a comment - Eric Payne , Existing user-limit is computed by following logic: user_limit = min { max{ current_capacity / #active_users, current_capacity * user_limit_percent}, queue_capacity * user_limit_factor} } In my above example, the #active_users is 2 instead of 3 (because B has no more pending resource). The reason why it uses #active-user is: existing user-limit is used to balance available resource to active users , it doesn't consider the needs to re-balance (via preemption) usages of users. To make intra-queue user limit preemption can correctly balance usages between users, we need to fix the scheduling logic as well.
          Hide
          eepayne Eric Payne added a comment -

          In my above example, the #active_users is 2 instead of 3 (because B has no more pending resource). The reason why it uses #active-user is: existing user-limit is used to balance available resource to active users, it doesn't consider the needs to re-balance (via preemption) usages of users. To make intra-queue user limit preemption can correctly balance usages between users, we need to fix the scheduling logic as well.

          I see. I wasn't suggesting that preemption should balance all users, only those that are asking.

          ...
              for app in sort-by-fifo-or-priority(apps) {
                 if (user-to-allocated.get(app.user) < user-limit-resource) {
                      app.allocated = min(app.used + pending, user-limit-resource - user-to-allocated.get(app.user));
                      user-to-allocated.get(app.user) += app.allocated;
          ...
          

          Yes, that would work. Thanks.

          Show
          eepayne Eric Payne added a comment - In my above example, the #active_users is 2 instead of 3 (because B has no more pending resource). The reason why it uses #active-user is: existing user-limit is used to balance available resource to active users, it doesn't consider the needs to re-balance (via preemption) usages of users. To make intra-queue user limit preemption can correctly balance usages between users, we need to fix the scheduling logic as well. I see. I wasn't suggesting that preemption should balance all users, only those that are asking. ... for app in sort-by-fifo-or-priority(apps) { if (user-to-allocated.get(app.user) < user-limit-resource) { app.allocated = min(app.used + pending, user-limit-resource - user-to-allocated.get(app.user)); user-to-allocated.get(app.user) += app.allocated; ... Yes, that would work. Thanks.
          Hide
          leftnoteasy Wangda Tan added a comment -

          I see. I wasn't suggesting that preemption should balance all users, only those that are asking.

          But I think we need an overhaul for user-limit related logic in CS so we can better balancing usages between users, which should be a part of the intra-queue preemption story. We can discuss more once this JIRA is done.

          Show
          leftnoteasy Wangda Tan added a comment - I see. I wasn't suggesting that preemption should balance all users, only those that are asking. But I think we need an overhaul for user-limit related logic in CS so we can better balancing usages between users, which should be a part of the intra-queue preemption story. We can discuss more once this JIRA is done.
          Hide
          eepayne Eric Payne added a comment -

          • IntraQueuePreemptableResourceCalculator#computeIntraQueuePreemptionDemand:
            • Shouldn't the following be tq.getUsed() - tq.getActuallyToBePreempted()? tq.getGuaranteed() only returns the queue's guaranteed capacity but if apps in the queue are using extra resources, then you want to subtract from the total usage.
                      tq.setUnAllocated(Resources.subtract(tq.getGuaranteed(),
                          tq.getActuallyToBePreempted()));
              
            • MaxIgnoredOverCapacity
                      if (leafQueue.getUsedCapacity() < context
                          .getMaxIgnoredOverCapacityForIntraQueue()) {
                        continue;
                      }
              
              • Shouldn't this also take into consideration used capacity of all parent queues as well?
              • In any case, can we change the name of the config property and its getters?
                • CapacitySchedulerConfiguration#MAX_IGNORED_OVER_CAPACITY_FOR_INTRA_QUEUE / yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity_for_intra_queue / ProportionalCapacityPreemptionPolicy#maxIgnoredOverCapacityForIntraQueue
                • This is not really an "over capacity" thing. It's more of an "only start to preempt when queue is over this amount" thing. Maybe we could name it something like yarn.resourcemanager.monitor.capacity.preemption.ignore_below_percent_of_queue

          • FifoIntraQueuePreemptionPolicy#createTempAppForResourceCalculation
            • In the following code, instead of calling the containsKey or the get* method twice, you could just call the get method, assign its output to a tmp var, and then if the tmp var is not null, then assign tmp to the resource var. That would just be a little more efficient.
                    if (app.getTotalPendingRequestsPerPartition().containsKey(partition)) {
              ...
                    if (null != app.getAppAttemptResourceUsage().getUsed(partition)) {
              ...
                    if (null != app.getCurrentReservation(partition)) {
              
            • Should pending also be cloned?
                    TempAppPerPartition tmpApp = new TempAppPerPartition(app.getQueueName(),
                        app.getApplicationId(), Resources.clone(used),
                        Resources.clone(reserved), pending, app.getPriority().getPriority(),
                        app, partitions);
              

          • FifoIntraQueuePreemptionPolicy#computeAppsIdealAllocation:
            • Can you please change the following comment:
                  // Apps ordered from highest to lower priority.
              
              • to be something like this?
                    // Remove the app at the next highest remaining priority and process it.
                
            • Can you please change the word "size" to "resources"? When I first saw "container size", I thought it was calculating the size of each container.
                    // Calculate total selected container size from current app.
              
            • I don't think we want to do the following:
                    if (Resources.lessThanOrEqual(rc, partitionBasedResource, userHeadroom,
                        Resources.none())) {
                      continue;
                    }
              
              • If user1 has used the entire queue with a low priority app, user1's headroom will be 0. But, if that same user starts a higher priority app, that higher priority app needs to preempt from the lower priority app, doesn't it?
            • I assume that you will rework the idealAssigned logic to match Wangda Tan's algorithm that he provided above. That is, the algorithm that takes into account user-limit-resource

          • FifoIntraQueuePreemptionPolicy#getHighPriorityApps:
            • The comments in this method are the same as those in AbstractPreemptableResourceCalculator#getMostUnderservedQueues, but they don't apply for getHighPriorityApps.
            • getHighPriorityApps doesn't need to return ArrayList<TempAppPerPartition>. It will only be retrieving one app at a time. In AbstractPreemptableResourceCalculator#getMostUnderservedQueues, it's possible for 2 or more queues to be underserved by exactly the same amount, so all of the most underserved queues must be processed together. However, getHighPriorityApps is using the taComparator comparator. Even if apps are the same priority, one will have a lower app ID, so there will never be 2 apps that compare equally.
          Show
          eepayne Eric Payne added a comment - IntraQueuePreemptableResourceCalculator#computeIntraQueuePreemptionDemand : Shouldn't the following be tq.getUsed() - tq.getActuallyToBePreempted() ? tq.getGuaranteed() only returns the queue's guaranteed capacity but if apps in the queue are using extra resources, then you want to subtract from the total usage. tq.setUnAllocated(Resources.subtract(tq.getGuaranteed(), tq.getActuallyToBePreempted())); MaxIgnoredOverCapacity if (leafQueue.getUsedCapacity() < context .getMaxIgnoredOverCapacityForIntraQueue()) { continue ; } Shouldn't this also take into consideration used capacity of all parent queues as well? In any case, can we change the name of the config property and its getters? CapacitySchedulerConfiguration#MAX_IGNORED_OVER_CAPACITY_FOR_INTRA_QUEUE / yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity_for_intra_queue / ProportionalCapacityPreemptionPolicy#maxIgnoredOverCapacityForIntraQueue This is not really an "over capacity" thing. It's more of an "only start to preempt when queue is over this amount" thing. Maybe we could name it something like yarn.resourcemanager.monitor.capacity.preemption.ignore_below_percent_of_queue FifoIntraQueuePreemptionPolicy#createTempAppForResourceCalculation In the following code, instead of calling the containsKey or the get* method twice, you could just call the get method, assign its output to a tmp var, and then if the tmp var is not null, then assign tmp to the resource var. That would just be a little more efficient. if (app.getTotalPendingRequestsPerPartition().containsKey(partition)) { ... if ( null != app.getAppAttemptResourceUsage().getUsed(partition)) { ... if ( null != app.getCurrentReservation(partition)) { Should pending also be cloned? TempAppPerPartition tmpApp = new TempAppPerPartition(app.getQueueName(), app.getApplicationId(), Resources.clone(used), Resources.clone(reserved), pending, app.getPriority().getPriority(), app, partitions); FifoIntraQueuePreemptionPolicy#computeAppsIdealAllocation : Can you please change the following comment: // Apps ordered from highest to lower priority. to be something like this? // Remove the app at the next highest remaining priority and process it. Can you please change the word "size" to "resources"? When I first saw "container size", I thought it was calculating the size of each container. // Calculate total selected container size from current app. I don't think we want to do the following: if (Resources.lessThanOrEqual(rc, partitionBasedResource, userHeadroom, Resources.none())) { continue ; } If user1 has used the entire queue with a low priority app, user1 's headroom will be 0. But, if that same user starts a higher priority app, that higher priority app needs to preempt from the lower priority app, doesn't it? I assume that you will rework the idealAssigned logic to match Wangda Tan 's algorithm that he provided above . That is, the algorithm that takes into account user-limit-resource FifoIntraQueuePreemptionPolicy#getHighPriorityApps : The comments in this method are the same as those in AbstractPreemptableResourceCalculator#getMostUnderservedQueues , but they don't apply for getHighPriorityApps . getHighPriorityApps doesn't need to return ArrayList<TempAppPerPartition> . It will only be retrieving one app at a time. In AbstractPreemptableResourceCalculator#getMostUnderservedQueues , it's possible for 2 or more queues to be underserved by exactly the same amount, so all of the most underserved queues must be processed together. However, getHighPriorityApps is using the taComparator comparator. Even if apps are the same priority, one will have a lower app ID, so there will never be 2 apps that compare equally.
          Hide
          sunilg Sunil G added a comment -

          Thanks Eric Payne for the comments. Mostly makes sense, I will update a new patch soon.

          Show
          sunilg Sunil G added a comment - Thanks Eric Payne for the comments. Mostly makes sense, I will update a new patch soon.
          Hide
          eepayne Eric Payne added a comment -

          Wangda Tan, I have some concerns about this algorithm from above:

              for app in sort-by-fifo-or-priority(apps) {
                 if (user-to-allocated.get(app.user) < user-limit-resource) {
                      app.allocated = min(app.used + pending, user-limit-resource - user-to-allocated.get(app.user));
                      user-to-allocated.get(app.user) += app.allocated;
                 } else {
                       // skip this app because user-limit reached
                 }
              }
          

          If Queue1 has 100 resources, and if user1 starts app1 at priority 1 that consumes the whole queue, won't user1's user-limit-resource be 100? Then, if user1 starts another app (app2) at priority 2, won't the above algorithm skip over app2 because user1 has already achieved its user-limit-resource?

          Show
          eepayne Eric Payne added a comment - Wangda Tan , I have some concerns about this algorithm from above: for app in sort-by-fifo-or-priority(apps) { if (user-to-allocated.get(app.user) < user-limit-resource) { app.allocated = min(app.used + pending, user-limit-resource - user-to-allocated.get(app.user)); user-to-allocated.get(app.user) += app.allocated; } else { // skip this app because user-limit reached } } If Queue1 has 100 resources, and if user1 starts app1 at priority 1 that consumes the whole queue, won't user1 's user-limit-resource be 100? Then, if user1 starts another app ( app2 ) at priority 2, won't the above algorithm skip over app2 because user1 has already achieved its user-limit-resource ?
          Hide
          sunilg Sunil G added a comment -

          Hi Eric Payne

          Couple of doubts on the comments.

          Shouldn't the following be tq.getUsed() - tq.getActuallyToBePreempted()? tq.getGuaranteed() only returns the queue's guaranteed capacity but if apps in the queue are using extra resources, then you want to subtract from the total usage.

          If we pick on getUsed, we will be working on from a variable base value "used". So it will be hard to predict or calculate the preemption logic for analysis. But i think i am agreeing on the factor that the used can more that guaranteed and unallocated could even go negative in specific cases. So we could take a max(guaranteed, used). Will this be fine?

          Shouldn't this also take into consideration used capacity of all parent queues as well?

          I think we might need to consider only LeafQueue as we are working on each LeafQueue's one by one for intra-queue preemption.

          f user1 has used the entire queue with a low priority app, user1's headroom will be 0. But, if that same user starts a higher priority app, that higher priority app needs to preempt from the lower priority app, doesn't it?

          I am doing this check in the first loop which runs from high priroity app to low priority app to calculate its idealAssigned. If an app's user-limit is already met, then we need not have to consider any more demand from that app. Hence such apps idealAssigned can be kept as 0. On the same line, assume if we are doing preemption here. app1 of user1 used entire queue. app2 of user2 asks more resources. if we preempt some container from app1, will scheduler allocate to app2? (provided there are some demand from other users). If i am not wrong, it will not got. pls correct me if I am wrong.

          I think this same scenario is mentioned by you in above comments. May be I could consider app2's demand provided there are no other user's waiting in queue with apps.

          Show
          sunilg Sunil G added a comment - Hi Eric Payne Couple of doubts on the comments. Shouldn't the following be tq.getUsed() - tq.getActuallyToBePreempted()? tq.getGuaranteed() only returns the queue's guaranteed capacity but if apps in the queue are using extra resources, then you want to subtract from the total usage. If we pick on getUsed, we will be working on from a variable base value "used". So it will be hard to predict or calculate the preemption logic for analysis. But i think i am agreeing on the factor that the used can more that guaranteed and unallocated could even go negative in specific cases. So we could take a max(guaranteed, used) . Will this be fine? Shouldn't this also take into consideration used capacity of all parent queues as well? I think we might need to consider only LeafQueue as we are working on each LeafQueue's one by one for intra-queue preemption. f user1 has used the entire queue with a low priority app, user1's headroom will be 0. But, if that same user starts a higher priority app, that higher priority app needs to preempt from the lower priority app, doesn't it? I am doing this check in the first loop which runs from high priroity app to low priority app to calculate its idealAssigned. If an app's user-limit is already met, then we need not have to consider any more demand from that app. Hence such apps idealAssigned can be kept as 0. On the same line, assume if we are doing preemption here. app1 of user1 used entire queue. app2 of user2 asks more resources. if we preempt some container from app1, will scheduler allocate to app2? (provided there are some demand from other users). If i am not wrong, it will not got. pls correct me if I am wrong. I think this same scenario is mentioned by you in above comments. May be I could consider app2's demand provided there are no other user's waiting in queue with apps.
          Hide
          eepayne Eric Payne added a comment -

          So we could take a max(guaranteed, used). Will this be fine?

          I don't think so. If tq.getActuallyToBePreempted is non-zero, it represents the amount that will be preempted from what tq is currently using, not tq's guaranteed resources. The purpose of this line of code is to set tq's unallocated resources. But even if tq is below it's guarantee, the amount of resources that intra-queue preemption should consider when balancing is not the queue's guarantee, it's what the queue is already using. If tq is below its guarantee, inter-queue preemption should be handling that.

          app1 of user1 used entire queue. app2 of user2 asks more resource

          The use case I'm referencing regarding this code is not regarding 2 different users. It's regarding the same user submitting jobs of different priorities. If user1 submits a low priority job that consumes the whole queue, user1's headroom will be 0. Then, when user1 submits a second app at a higher priority, this code will cause the second app to starve because user1 has already used up its allocation.

          Show
          eepayne Eric Payne added a comment - So we could take a max(guaranteed, used). Will this be fine? I don't think so. If tq.getActuallyToBePreempted is non-zero, it represents the amount that will be preempted from what tq is currently using, not tq 's guaranteed resources. The purpose of this line of code is to set tq 's unallocated resources. But even if tq is below it's guarantee, the amount of resources that intra-queue preemption should consider when balancing is not the queue's guarantee, it's what the queue is already using. If tq is below its guarantee, inter-queue preemption should be handling that. app1 of user1 used entire queue. app2 of user2 asks more resource The use case I'm referencing regarding this code is not regarding 2 different users. It's regarding the same user submitting jobs of different priorities. If user1 submits a low priority job that consumes the whole queue, user1 's headroom will be 0. Then, when user1 submits a second app at a higher priority, this code will cause the second app to starve because user1 has already used up its allocation.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Eric Payne,

          If Queue1 has 100 resources, and if user1 starts app1 at priority 1 that consumes the whole queue, won't user1's user-limit-resource be 100? Then, if user1 starts another app (app2) at priority 2, won't the above algorithm skip over app2 because user1 has already achieved its user-limit-resource?

          For your concern, I previously mentioned in the pseudo code:

              // initial all value to 0
              Map<String, Resource> user-to-allocated;
          

          Initially user-to-allocated is set to 0, and when we loop applications, we will go to app2 first, so app2's ideal-allocation becomes 100 and app1's ideal-allocation will be updated to 0. (Assume app2 has >= 100 pending resource).

          Plz let me know if I missed anything.

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - Hi Eric Payne , If Queue1 has 100 resources, and if user1 starts app1 at priority 1 that consumes the whole queue, won't user1's user-limit-resource be 100? Then, if user1 starts another app (app2) at priority 2, won't the above algorithm skip over app2 because user1 has already achieved its user-limit-resource? For your concern, I previously mentioned in the pseudo code: // initial all value to 0 Map< String , Resource> user-to-allocated; Initially user-to-allocated is set to 0, and when we loop applications, we will go to app2 first, so app2's ideal-allocation becomes 100 and app1's ideal-allocation will be updated to 0. (Assume app2 has >= 100 pending resource). Plz let me know if I missed anything. Thanks,
          Hide
          eepayne Eric Payne added a comment -

          Wangda Tan, Thanks. I missed that.

          Show
          eepayne Eric Payne added a comment - Wangda Tan , Thanks. I missed that.
          Hide
          sunilg Sunil G added a comment -

          Thank you very much Eric Payne for sharing the thoughts.

          The purpose of this line of code is to set tq's unallocated resources. But even if tq is below it's guarantee, the amount of resources that intra-queue preemption should consider when balancing is not the queue's guarantee, it's what the queue is already using. If tq is below its guarantee, inter-queue preemption should be handling that.

          I also thought about this in similar lines. But there was one point which i thought it may make more sense if we use guaranteed. If queue is under used than its capacity, our current calculation may consider more resource for idealAssigned per app. This may yield a lesser value toBePreempted per app (from lower priority apps). And it may be fine because there some more resource which is available in queue for other high priority apps. So we may not need to preempt immediately. Does this make sense?

          The use case I'm referencing regarding this code is not regarding 2 different users. It's regarding the same user submitting jobs of different priorities. If user1 submits a low priority job that consumes the whole queue, user1's headroom will be 0. Then, when user1 submits a second app at a higher priority, this code will cause the second app to starve because user1 has already used up its allocation.

          I think i understood your point. Let me make necessary change.

          Show
          sunilg Sunil G added a comment - Thank you very much Eric Payne for sharing the thoughts. The purpose of this line of code is to set tq's unallocated resources. But even if tq is below it's guarantee, the amount of resources that intra-queue preemption should consider when balancing is not the queue's guarantee, it's what the queue is already using. If tq is below its guarantee, inter-queue preemption should be handling that. I also thought about this in similar lines. But there was one point which i thought it may make more sense if we use guaranteed. If queue is under used than its capacity, our current calculation may consider more resource for idealAssigned per app. This may yield a lesser value toBePreempted per app (from lower priority apps). And it may be fine because there some more resource which is available in queue for other high priority apps. So we may not need to preempt immediately. Does this make sense? The use case I'm referencing regarding this code is not regarding 2 different users. It's regarding the same user submitting jobs of different priorities. If user1 submits a low priority job that consumes the whole queue, user1's headroom will be 0. Then, when user1 submits a second app at a higher priority, this code will cause the second app to starve because user1 has already used up its allocation. I think i understood your point. Let me make necessary change.
          Hide
          eepayne Eric Payne added a comment -

          I also thought about this in similar lines. But there was one point which i thought it may make more sense if we use guaranteed. If queue is under used than its capacity, our current calculation may consider more resource for idealAssigned per app. This may yield a lesser value toBePreempted per app (from lower priority apps). And it may be fine because there some more resource which is available in queue for other high priority apps. So we may not need to preempt immediately. Does this make sense?

          Sunil G, I'm sorry, but I don't understand. Can you provide a step-by-step use case to demonstrate your concern?

          Show
          eepayne Eric Payne added a comment - I also thought about this in similar lines. But there was one point which i thought it may make more sense if we use guaranteed. If queue is under used than its capacity, our current calculation may consider more resource for idealAssigned per app. This may yield a lesser value toBePreempted per app (from lower priority apps). And it may be fine because there some more resource which is available in queue for other high priority apps. So we may not need to preempt immediately. Does this make sense? Sunil G , I'm sorry, but I don't understand. Can you provide a step-by-step use case to demonstrate your concern?
          Hide
          eepayne Eric Payne added a comment -

          Hi Sunil G. After thinking more about it, I think I would like to express my ideas about tq.unassigned in the following algorithm:

          tq.unassigned = tq.used
          while tq.unassigned
              for app : underservedApps
                  app.idealAssigned = app.used + app.pending // considering, of course, user-resource-limit, as Wangda defined it above
                  tq.unassigned -= app.idealAssigned
          

          My concern is that if 1) tq.guaranteed is used in the above algorithm instead of tq.used, and 2) if tq.used is less than tq.guaranteed, then the above algorithm will want to ideally assign more total resources to all apps than are being used. If that happens, then when it comes time for the intra-queue preemption policy to preempt resources, it seems to me that the policy won't preempt enough resources.

          It seems tome that the intra-queue preemption policy should only be considering actually being used resources when deciding how much to preempt, not guaranteed resources.

          Show
          eepayne Eric Payne added a comment - Hi Sunil G . After thinking more about it, I think I would like to express my ideas about tq.unassigned in the following algorithm: tq.unassigned = tq.used while tq.unassigned for app : underservedApps app.idealAssigned = app.used + app.pending // considering, of course, user-resource-limit, as Wangda defined it above tq.unassigned -= app.idealAssigned My concern is that if 1) tq.guaranteed is used in the above algorithm instead of tq.used , and 2) if tq.used is less than tq.guaranteed , then the above algorithm will want to ideally assign more total resources to all apps than are being used. If that happens, then when it comes time for the intra-queue preemption policy to preempt resources, it seems to me that the policy won't preempt enough resources. It seems tome that the intra-queue preemption policy should only be considering actually being used resources when deciding how much to preempt, not guaranteed resources.
          Hide
          sunilg Sunil G added a comment - - edited

          Thanks Eric Payne for the detailed explanation.

          when it comes time for the intra-queue preemption policy to preempt resources, it seems to me that the policy won't preempt enough resources.

          Yes. I was also thinking on same line. If tq.used is less than tq.guaranteed, then we may hit less preemption if we use tq.guaranteed. Since used is less than guaranteed, there are enough resources available for this queue. So high priority apps will automatically get those resources by scheduler.
          But after thinking more, there is one case. Adding to the above mentioned scenario (tq.used is less than tq.guaranteed), assume another queue was over-utilizing resource, so current queue has to wait till inter-queue preemption to kick in (to get guaranteed - used). If inter-queue preemption is turned off, then current queue may not get those resources immediately. In such cases, I think we need to use tq.used. I would like hear thoughts on this point. Eric Payne and Wangda Tan. pls share your thoughts.

          Show
          sunilg Sunil G added a comment - - edited Thanks Eric Payne for the detailed explanation. when it comes time for the intra-queue preemption policy to preempt resources, it seems to me that the policy won't preempt enough resources. Yes. I was also thinking on same line. If tq.used is less than tq.guaranteed , then we may hit less preemption if we use tq.guaranteed . Since used is less than guaranteed , there are enough resources available for this queue. So high priority apps will automatically get those resources by scheduler. But after thinking more, there is one case. Adding to the above mentioned scenario ( tq.used is less than tq.guaranteed ), assume another queue was over-utilizing resource, so current queue has to wait till inter-queue preemption to kick in (to get guaranteed - used ). If inter-queue preemption is turned off, then current queue may not get those resources immediately. In such cases, I think we need to use tq.used . I would like hear thoughts on this point. Eric Payne and Wangda Tan . pls share your thoughts.
          Hide
          sunilg Sunil G added a comment -

          Attaching new patch with user-limit support.

          I think FifoIntraQueuePreemptionPolicy.computeAppsIdealAllocation became more complex. I will try make it more simple in next patch.

          Show
          sunilg Sunil G added a comment - Attaching new patch with user-limit support. I think FifoIntraQueuePreemptionPolicy.computeAppsIdealAllocation became more complex. I will try make it more simple in next patch.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks Sunil G,

          More comments:

          1) Config:
          General, it's better to use "-" instead of "_", we made such mistakes in previous preemption settings, let's use the new name convention for new added configs.

          And we can add a prefix for intra-queue preemption related configs to better group these options.

          1.1 select_based_on_intra_queue_policie rename to intra-queue-preemption.enabled? For the name select_based_on_intra_queue_policies you mentioned before, I would prefer to avoid words like "policy", it is scary to end user .
          1.2 ignore_preemption_below_used_capacity to intra-queue-preemption.minimum-threshold?
          1.3 max_allowable_preempt_limit_for_intra_queue to intra-queue-preemption.max-allowable-limit?

          And to me all above options should have a default global option and per-queue option, having different settings for different workload/queue is one of the most important requirement of this feature to me.

          2) PreemptionCandidateSelector,

          Still have one TODO comment, plz delete if it is done.

          3) IntraQueueCandidatesSelector

          • fifoBasedPreemptionPolicy -> fifoPreemptionComputePlugin
          • 3. Loop through all partitions to calculate demand, demand is already calculated, this is for select containers to be preempted, update comment?
          • tryPreemptContainerAndDeductResToObtain, if they have same logic, can we merge it with tryPreemptContainerAndDeductResToObtain of FifoCandidatesSelector, which can be moved to CapacitySchedulerPreemptionUtils

          4) TempQueuePerPartition,

          • It seems to me unAllocated/selectedContainers are not required, selectedContainers is not used. And unAllocated are all used within the same code block, I would prefer to use a variable to store it.

          5) FifoIntraQueuePreemptionPolicy

          • Rename it to -Plugin?
          • Rename calculateSelectedResourcePerApp -> getAlreadySelectedPreemptionCandidateResource? ("SelectedResource" here is not very clear), and add proper comment
          • computeAppsIdealAllocation
            a. initialIdealAssigned -> appIdealAssigned
            b. resourceLimitPerUserForApp is not requried, appAssigned can be reused
            c. remainingUsed -> appUsedExcludedSelected
            d. As mentioned in 4), tq.unAllocated could move out to queueTotalUnassigned
            e. computation of preemptionLimit should deduct already selected candidates?
            f. orderedApps is invented to avoid sorting, correct? If you need to sort it again, we don't need it.
            g. As you mentioned, this method is a little condense, it's better to break down to several children methods.
          • validateOutSameAppPriorityFromDemand
            a. My personal preferrence: use a TempAppPerPartition[] to store the data, and use two integers to indicate head and tail. The Java Iterator is not very readable to me
            b. Suggest move it from TempQueuePerPartition to FifoIntraQueuePreemptionPolicy

          6) AbstractKeyParamsForPreemption -> something like BasePreemptionEntity? The "key param" is not quite understandable to me.

          7) intraQPreemptableAmountCalculator:

          • I think we may not need it to be a separated now, we have -Plugin already, -Calculator is a little confusing with the -Plugin regarding to responsibility, and it has only one public method, I think we can move it to IntraQueueCandidatesSelector, thoughts?
          • computeIntraQueuePreemptionDemand -> computeIntraQueuePreemptionIdealAllocation? (Ideal allocation or to-be-preempted is the goal, but demand is not)

          8) Others:

          • New method of SchedulerApplicationAttempt may not required, you can use getAppAttemptResourceUsage instead.
          Show
          leftnoteasy Wangda Tan added a comment - Thanks Sunil G , More comments: 1) Config: General, it's better to use "-" instead of "_", we made such mistakes in previous preemption settings, let's use the new name convention for new added configs. And we can add a prefix for intra-queue preemption related configs to better group these options. 1.1 select_based_on_intra_queue_policie rename to intra-queue-preemption.enabled ? For the name select_based_on_intra_queue_policies you mentioned before, I would prefer to avoid words like "policy", it is scary to end user . 1.2 ignore_preemption_below_used_capacity to intra-queue-preemption.minimum-threshold ? 1.3 max_allowable_preempt_limit_for_intra_queue to intra-queue-preemption.max-allowable-limit ? And to me all above options should have a default global option and per-queue option, having different settings for different workload/queue is one of the most important requirement of this feature to me. 2) PreemptionCandidateSelector, Still have one TODO comment, plz delete if it is done. 3) IntraQueueCandidatesSelector fifoBasedPreemptionPolicy -> fifoPreemptionComputePlugin 3. Loop through all partitions to calculate demand , demand is already calculated, this is for select containers to be preempted, update comment? tryPreemptContainerAndDeductResToObtain, if they have same logic, can we merge it with tryPreemptContainerAndDeductResToObtain of FifoCandidatesSelector, which can be moved to CapacitySchedulerPreemptionUtils 4) TempQueuePerPartition, It seems to me unAllocated/selectedContainers are not required, selectedContainers is not used. And unAllocated are all used within the same code block, I would prefer to use a variable to store it. 5) FifoIntraQueuePreemptionPolicy Rename it to -Plugin? Rename calculateSelectedResourcePerApp -> getAlreadySelectedPreemptionCandidateResource? ("SelectedResource" here is not very clear), and add proper comment computeAppsIdealAllocation a. initialIdealAssigned -> appIdealAssigned b. resourceLimitPerUserForApp is not requried, appAssigned can be reused c. remainingUsed -> appUsedExcludedSelected d. As mentioned in 4), tq.unAllocated could move out to queueTotalUnassigned e. computation of preemptionLimit should deduct already selected candidates? f. orderedApps is invented to avoid sorting, correct? If you need to sort it again, we don't need it. g. As you mentioned, this method is a little condense, it's better to break down to several children methods. validateOutSameAppPriorityFromDemand a. My personal preferrence: use a TempAppPerPartition[] to store the data, and use two integers to indicate head and tail. The Java Iterator is not very readable to me b. Suggest move it from TempQueuePerPartition to FifoIntraQueuePreemptionPolicy 6) AbstractKeyParamsForPreemption -> something like BasePreemptionEntity? The "key param" is not quite understandable to me. 7) intraQPreemptableAmountCalculator: I think we may not need it to be a separated now, we have -Plugin already, -Calculator is a little confusing with the -Plugin regarding to responsibility, and it has only one public method, I think we can move it to IntraQueueCandidatesSelector, thoughts? computeIntraQueuePreemptionDemand -> computeIntraQueuePreemptionIdealAllocation? (Ideal allocation or to-be-preempted is the goal, but demand is not) 8) Others: New method of SchedulerApplicationAttempt may not required, you can use getAppAttemptResourceUsage instead.
          Hide
          sunilg Sunil G added a comment -

          Thank Wangda Tan for the comments. Updating new patch addressing the comments.

          f. orderedApps is invented to avoid sorting, correct? If you need to sort it again, we don't need it.

          orderedApps is a list of TempAppPerPartition and comparator is designed based on FiFo. So initial list is created from the PriorityQueue, and we removed each entry from top to get the high priority app. orderedApps has just collected those apps in order, and then later re-sorted to get apps in reverse order. tq.getApps return FiCaSchedulerApp, but we need TempAppPerPartition. So i think orderedApps is needed for the logic. Did i miss something?

          Show
          sunilg Sunil G added a comment - Thank Wangda Tan for the comments. Updating new patch addressing the comments. f. orderedApps is invented to avoid sorting, correct? If you need to sort it again, we don't need it. orderedApps is a list of TempAppPerPartition and comparator is designed based on FiFo. So initial list is created from the PriorityQueue, and we removed each entry from top to get the high priority app. orderedApps has just collected those apps in order, and then later re-sorted to get apps in reverse order. tq.getApps return FiCaSchedulerApp, but we need TempAppPerPartition. So i think orderedApps is needed for the logic. Did i miss something?
          Hide
          eepayne Eric Payne added a comment -

          Hi Sunil G,

          I noticed some odd behavior while trying the following use case:

          1. User1 starts app1 at priority1 and consumes the entire queue
          2. User1 starts app2 at priority2
          3. preemption happens for all containers from app1 except the AM container
          4. app2 consumes all containers released by app1
          5. The preemption monitor preempts containers from app2! This continues as long as app2 runs.

          I believe it is caused by the following code:

          FifoIntraQueuePreemptionPolicy#getResourceDemandFromAppsPerQueue
              Collection<TempAppPerPartition> appsOrderedByPriority = tq.getApps();
              Resource actualPreemptNeeded = null;
          
              for (TempAppPerPartition a1 : appsOrderedByPriority) {
                for (String label : a1.getPartitions()) {
          
                  // Updating pending resource per-partition level.
                  if ((actualPreemptNeeded = resToObtainByPartition.get(label)) == null) {
                    actualPreemptNeeded = Resources.createResource(0, 0);
                    resToObtainByPartition.put(label, actualPreemptNeeded);
                  }
                  Resources.addTo(actualPreemptNeeded, a1.getActuallyToBePreempted());
                }
              }
              return resToObtainByPartition;
          

          Since app1's AM container is still running, the size of actuallyToBePreempted for app1 is the size of the AM's container. This gets added to actualPreemptNeeded and put into actualPreemptNeeded, which then gets passed to IntraQueueCandidatesSelector#preemptFromLeastStarvedApp. preemptFromLeastStarvedApp skips app1's AM, and then preempts from the only remaining thing with resources, which is app2.

          I'm not sure exactly how I would fix this yet, except to consider the size of the AM when calculating actuallyToBePreempted.

          Show
          eepayne Eric Payne added a comment - Hi Sunil G , I noticed some odd behavior while trying the following use case: User1 starts app1 at priority1 and consumes the entire queue User1 starts app2 at priority2 preemption happens for all containers from app1 except the AM container app2 consumes all containers released by app1 The preemption monitor preempts containers from app2 ! This continues as long as app2 runs. I believe it is caused by the following code: FifoIntraQueuePreemptionPolicy#getResourceDemandFromAppsPerQueue Collection<TempAppPerPartition> appsOrderedByPriority = tq.getApps(); Resource actualPreemptNeeded = null ; for (TempAppPerPartition a1 : appsOrderedByPriority) { for ( String label : a1.getPartitions()) { // Updating pending resource per-partition level. if ((actualPreemptNeeded = resToObtainByPartition.get(label)) == null ) { actualPreemptNeeded = Resources.createResource(0, 0); resToObtainByPartition.put(label, actualPreemptNeeded); } Resources.addTo(actualPreemptNeeded, a1.getActuallyToBePreempted()); } } return resToObtainByPartition; Since app1 's AM container is still running, the size of actuallyToBePreempted for app1 is the size of the AM's container. This gets added to actualPreemptNeeded and put into actualPreemptNeeded , which then gets passed to IntraQueueCandidatesSelector#preemptFromLeastStarvedApp . preemptFromLeastStarvedApp skips app1 's AM, and then preempts from the only remaining thing with resources, which is app2 . I'm not sure exactly how I would fix this yet, except to consider the size of the AM when calculating actuallyToBePreempted .
          Hide
          leftnoteasy Wangda Tan added a comment -

          Sunil G,

          orderedApps is a list of TempAppPerPartition and comparator is designed based on FiFo. So initial list is created from the PriorityQueue, and we removed each entry from top to get the high priority app. orderedApps has just collected those apps in order, and then later re-sorted to get apps in reverse order. tq.getApps return FiCaSchedulerApp, but we need TempAppPerPartition. So i think orderedApps is needed for the logic. Did i miss something?

          This is only a minor comment: what I meant is, if orderedApps need to be re-sort (Use Collection.sort(list, comparator), you don't need to do the remove-and-add logic. Instead you can do not remove and re-sort the first list. Let me know if I didn't understand it correctly..

          Show
          leftnoteasy Wangda Tan added a comment - Sunil G , orderedApps is a list of TempAppPerPartition and comparator is designed based on FiFo. So initial list is created from the PriorityQueue, and we removed each entry from top to get the high priority app. orderedApps has just collected those apps in order, and then later re-sorted to get apps in reverse order. tq.getApps return FiCaSchedulerApp, but we need TempAppPerPartition. So i think orderedApps is needed for the logic. Did i miss something? This is only a minor comment: what I meant is, if orderedApps need to be re-sort (Use Collection.sort(list, comparator), you don't need to do the remove-and-add logic. Instead you can do not remove and re-sort the first list. Let me know if I didn't understand it correctly..
          Hide
          sunilg Sunil G added a comment -

          Thanks Eric Payne for pointing out this corner case.

          Updating new patch.

          Show
          sunilg Sunil G added a comment - Thanks Eric Payne for pointing out this corner case. Updating new patch.
          Hide
          eepayne Eric Payne added a comment -

          Thanks, Sunil G, for the new patch. Preemption for the purposes of preventing priority inversion seems to work now without unneeded preemption.

          However, in this new patch, user-limit-percent preemption doesn't seem to be working. If:

          1. user1 starts app1 at priority1 on Queue1 and consumes the entire queue
          2. user2 starts app2 at priority1 on Queue1

          preemption does not happen.

          I will continue to investigate, but I thought I would let you know.

          Show
          eepayne Eric Payne added a comment - Thanks, Sunil G , for the new patch. Preemption for the purposes of preventing priority inversion seems to work now without unneeded preemption. However, in this new patch, user-limit-percent preemption doesn't seem to be working. If: user1 starts app1 at priority1 on Queue1 and consumes the entire queue user2 starts app2 at priority1 on Queue1 preemption does not happen. I will continue to investigate, but I thought I would let you know.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 7m 57s trunk passed
          +1 compile 0m 34s trunk passed
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 41s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 1m 5s trunk passed
          +1 javadoc 0m 22s trunk passed
          +1 mvninstall 0m 33s the patch passed
          +1 compile 0m 34s the patch passed
          +1 javac 0m 34s the patch passed
          -1 checkstyle 0m 22s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 82 new + 179 unchanged - 29 fixed = 261 total (was 208)
          +1 mvnsite 0m 36s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 findbugs 1m 10s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)
          -1 javadoc 0m 24s hadoop-yarn-server-resourcemanager in the patch failed.
          -1 unit 38m 6s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          54m 38s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 45-55]
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 59-69]
            Unread field:TempAppPerPartition.java:[line 60]
          Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
            hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831577/YARN-2009.0005.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 882d9c86c04c 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 272a217
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13302/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13302/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13302/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13302/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13302/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13302/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13302/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 57s trunk passed +1 compile 0m 34s trunk passed +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 41s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 1m 5s trunk passed +1 javadoc 0m 22s trunk passed +1 mvninstall 0m 33s the patch passed +1 compile 0m 34s the patch passed +1 javac 0m 34s the patch passed -1 checkstyle 0m 22s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 82 new + 179 unchanged - 29 fixed = 261 total (was 208) +1 mvnsite 0m 36s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 1m 10s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) -1 javadoc 0m 24s hadoop-yarn-server-resourcemanager in the patch failed. -1 unit 38m 6s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 54m 38s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 45-55]   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 59-69]   Unread field:TempAppPerPartition.java: [line 60] Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling   hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831577/YARN-2009.0005.patch JIRA Issue YARN-2009 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 882d9c86c04c 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 272a217 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13302/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13302/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13302/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/13302/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13302/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13302/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13302/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          HI Eric Payne
          Thanks for sharing the details of the usecase .I have checked this problem and I know why that scenario is not working.

          FifoIntraQueuePreemptionPlugin.validateOutSameAppPriorityFromDemand is added to ensure that we should not do preemption for demand from same priority level. This code is hitting and causing Zero preemption in your scenarios.

          I wanted to add a different condition for both scenarios (user-limit alone AND user-limit + priority) but I would like to do that in another ticket. So it will be easier to track and test. This current patch will still handle priority and priority + user-limit. Thoughts? Eric Payne and Wangda Tan

          Show
          sunilg Sunil G added a comment - HI Eric Payne Thanks for sharing the details of the usecase .I have checked this problem and I know why that scenario is not working. FifoIntraQueuePreemptionPlugin.validateOutSameAppPriorityFromDemand is added to ensure that we should not do preemption for demand from same priority level. This code is hitting and causing Zero preemption in your scenarios. I wanted to add a different condition for both scenarios (user-limit alone AND user-limit + priority) but I would like to do that in another ticket. So it will be easier to track and test. This current patch will still handle priority and priority + user-limit. Thoughts? Eric Payne and Wangda Tan
          Hide
          eepayne Eric Payne added a comment -

          Thanks, Sunil G, for your reply.


          • FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp
            • The assignment to tmpApp.idealAssigned should be cloned:
                      tmpApp.idealAssigned = Resources.min(rc, clusterResource,
                          queueTotalUnassigned, appIdealAssigned);
              ...
                    Resources.subtractFrom(queueTotalUnassigned, tmpApp.idealAssigned);
              
            • In the above code, if queueTotalUnassigned is less than appIdealAssigned, then tmpApp.idealAssigned is assigned a reference to queueTotalUnassigned. Then, later, tmpApp.idealAssigned is actually subtracted from itself.

          This current patch will still handle priority and priority + user-limit. Thoughts?

          I am not comfortable with fixing this in another patch. Our main use case is the one where multiple users need to use the same queue with apps at the same priority.

          • I still need to think through all of the effects, but I was thinking that something like the following could work:
            • I think my use case is failing because FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp orders the apps by priority. I think that instead, it should order the apps by how much they are underserved. I think that it should be ordering the apps by tmpApp.toBePreemptedByOther instead of priority.
            • Then, if calculateIdealAssignedResourcePerApp orders the apps by toBePreemptedByOther, validateOutSameAppPriorityFromDemand would also need to not compare priorities but the app's requirements.
            • I think it should be something like the following, maybe:
                  while (lowAppIndex < highAppIndex
                      && !apps[lowAppIndex].equals(apps[highAppIndex])
              //        && apps[lowAppIndex].getPriority() < apps[highAppIndex].getPriority()) {
                      && Resources.lessThan(rc, clusterResource,
                              apps[lowAppIndex].getToBePreemptFromOther(), 
                              apps[highAppIndex].getToBePreemptFromOther()) ) {
              
          Show
          eepayne Eric Payne added a comment - Thanks, Sunil G , for your reply. FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp The assignment to tmpApp.idealAssigned should be cloned: tmpApp.idealAssigned = Resources.min(rc, clusterResource, queueTotalUnassigned, appIdealAssigned); ... Resources.subtractFrom(queueTotalUnassigned, tmpApp.idealAssigned); In the above code, if queueTotalUnassigned is less than appIdealAssigned , then tmpApp.idealAssigned is assigned a reference to queueTotalUnassigned . Then, later, tmpApp.idealAssigned is actually subtracted from itself. This current patch will still handle priority and priority + user-limit. Thoughts? I am not comfortable with fixing this in another patch. Our main use case is the one where multiple users need to use the same queue with apps at the same priority. I still need to think through all of the effects, but I was thinking that something like the following could work: I think my use case is failing because FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp orders the apps by priority. I think that instead, it should order the apps by how much they are underserved. I think that it should be ordering the apps by tmpApp.toBePreemptedByOther instead of priority. Then, if calculateIdealAssignedResourcePerApp orders the apps by toBePreemptedByOther , validateOutSameAppPriorityFromDemand would also need to not compare priorities but the app's requirements. I think it should be something like the following, maybe: while (lowAppIndex < highAppIndex && !apps[lowAppIndex].equals(apps[highAppIndex]) // && apps[lowAppIndex].getPriority() < apps[highAppIndex].getPriority()) { && Resources.lessThan(rc, clusterResource, apps[lowAppIndex].getToBePreemptFromOther(), apps[highAppIndex].getToBePreemptFromOther()) ) {
          Hide
          leftnoteasy Wangda Tan added a comment -

          I am not comfortable with fixing this in another patch. Our main use case is the one where multiple users need to use the same queue with apps at the same priority...

          Eric Payne, I'm not trying to argue the importance of the use case: multiple users need to use the same queue with apps at the same priority – it's the most important use case of intra-queue preemption. But to make we can easier review the patch, can we move it to a separate JIRA and start to work on it right after the JIRA is get committed (if we're generally fine with existing JIRA). This patch is already very dense, review a patch with changed logics is much harder than review a separate patch on top of this one.

          The major problem of why this patch cannot solve the preemption between user is – it doesn't calculate "ideal allocation" and "preemptable resource" for user. Existing logic only caps calculated ideal allocation by user limit, we need to do more:
          For example, when there're two users in a queue, u1 above UL, u2 below UL, however u2 doesn't have more pending resource, in this case, we shouldn't preempt resource from u1.

          To solve the problem, we need add some logics to calculateIdealAssignedResourcePerApp to calculate the actual ideal allocation / preemptable for users, and we need update the while condition in validateOutSameAppPriorityFromDemand as well. It doesn't sound like a very simple patch (few lines change) to me.

          The change proposal mentioned in your above comment is not correct:

          I think it should be something like the following, maybe: ...

          This will cause application from the same user preempt according to FIFO order (A1/A2 from the same user, A1 submitted first, and ask for more resource, A2 can preempt resource from A2), which is what we want to avoid.

          And this doesn't sound correct to me as well:

          I think that instead, it should order the apps by how much they are underserved. I think that it should be ordering the apps by tmpApp.toBePreemptedByOther instead of priority...

          It could preempt violate FIFO order, and in the scheduling cycle, preempted resource may come back to the app which is just get preempted.

          Please let me if you have any concerns to move the user limit preemption to a separate JIRA.

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - I am not comfortable with fixing this in another patch. Our main use case is the one where multiple users need to use the same queue with apps at the same priority... Eric Payne , I'm not trying to argue the importance of the use case: multiple users need to use the same queue with apps at the same priority – it's the most important use case of intra-queue preemption. But to make we can easier review the patch, can we move it to a separate JIRA and start to work on it right after the JIRA is get committed (if we're generally fine with existing JIRA). This patch is already very dense, review a patch with changed logics is much harder than review a separate patch on top of this one. The major problem of why this patch cannot solve the preemption between user is – it doesn't calculate "ideal allocation" and "preemptable resource" for user. Existing logic only caps calculated ideal allocation by user limit, we need to do more: For example, when there're two users in a queue, u1 above UL, u2 below UL, however u2 doesn't have more pending resource, in this case, we shouldn't preempt resource from u1. To solve the problem, we need add some logics to calculateIdealAssignedResourcePerApp to calculate the actual ideal allocation / preemptable for users, and we need update the while condition in validateOutSameAppPriorityFromDemand as well. It doesn't sound like a very simple patch (few lines change) to me. The change proposal mentioned in your above comment is not correct: I think it should be something like the following, maybe: ... This will cause application from the same user preempt according to FIFO order (A1/A2 from the same user, A1 submitted first, and ask for more resource, A2 can preempt resource from A2), which is what we want to avoid. And this doesn't sound correct to me as well: I think that instead, it should order the apps by how much they are underserved. I think that it should be ordering the apps by tmpApp.toBePreemptedByOther instead of priority... It could preempt violate FIFO order, and in the scheduling cycle, preempted resource may come back to the app which is just get preempted. Please let me if you have any concerns to move the user limit preemption to a separate JIRA. Thanks,
          Hide
          eepayne Eric Payne added a comment -

          Thanks for all the work Sunil G and Wangda Tan have put into this feature.

          can we move it to a separate JIRA and start to work on it right after the JIRA is get committed

          As long as we don't let the user-limit-percent preemption JIRA grow cold, I'm okay with it.

          Show
          eepayne Eric Payne added a comment - Thanks for all the work Sunil G and Wangda Tan have put into this feature. can we move it to a separate JIRA and start to work on it right after the JIRA is get committed As long as we don't let the user-limit-percent preemption JIRA grow cold, I'm okay with it.
          Hide
          sunilg Sunil G added a comment -

          Thanks Eric Payne
          I am already working on user-limit and will share the design thoughts and poc patch in new jira. Thank you very much for the support. Meantime, I will update this patch with few more tests which covers NodeLabels etc.

          Show
          sunilg Sunil G added a comment - Thanks Eric Payne I am already working on user-limit and will share the design thoughts and poc patch in new jira. Thank you very much for the support. Meantime, I will update this patch with few more tests which covers NodeLabels etc.
          Hide
          eepayne Eric Payne added a comment -

          Sunil G, do you plan on backporting this to branch-2/branch-2.8?

          Show
          eepayne Eric Payne added a comment - Sunil G , do you plan on backporting this to branch-2/branch-2.8?
          Hide
          sunilg Sunil G added a comment -

          Updating new patch after fixing few jenkins issues. Also added nodelabel tests.

          Wangda Tan and Eric Payne pls help to check.

          Eric Payne, i think we can bring this to branch-2. I will check whether other surgical preemption patches are for 2.8.

          Show
          sunilg Sunil G added a comment - Updating new patch after fixing few jenkins issues. Also added nodelabel tests. Wangda Tan and Eric Payne pls help to check. Eric Payne , i think we can bring this to branch-2. I will check whether other surgical preemption patches are for 2.8.
          Hide
          eepayne Eric Payne added a comment -

          Thanks Sunil G. In order to backport YARN-2009 to 2.8, I think that at least YARN-4108 and YARN-4822 would need to also be backported to 2.8.

          Show
          eepayne Eric Payne added a comment - Thanks Sunil G . In order to backport YARN-2009 to 2.8, I think that at least YARN-4108 and YARN-4822 would need to also be backported to 2.8.
          Hide
          eepayne Eric Payne added a comment -

          Sunil G, I think you may have missed my comment from above:

          • FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp
            • The assignment to tmpApp.idealAssigned should be cloned:
                      tmpApp.idealAssigned = Resources.min(rc, clusterResource,
                          queueTotalUnassigned, appIdealAssigned);
              ...
                    Resources.subtractFrom(queueTotalUnassigned, tmpApp.idealAssigned);
              
            • In the above code, if queueTotalUnassigned is less than appIdealAssigned, then tmpApp.idealAssigned is assigned a reference to queueTotalUnassigned. Then, later, tmpApp.idealAssigned is actually subtracted from itself.

          So, I think the above code should be:

                  tmpApp.idealAssigned = Resources.clone(Resources.min(...));
          
          Show
          eepayne Eric Payne added a comment - Sunil G , I think you may have missed my comment from above : FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp The assignment to tmpApp.idealAssigned should be cloned: tmpApp.idealAssigned = Resources.min(rc, clusterResource, queueTotalUnassigned, appIdealAssigned); ... Resources.subtractFrom(queueTotalUnassigned, tmpApp.idealAssigned); In the above code, if queueTotalUnassigned is less than appIdealAssigned , then tmpApp.idealAssigned is assigned a reference to queueTotalUnassigned . Then, later, tmpApp.idealAssigned is actually subtracted from itself. So, I think the above code should be: tmpApp.idealAssigned = Resources.clone(Resources.min(...));
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 38s trunk passed
          +1 compile 0m 31s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 0m 57s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 32s the patch passed
          +1 compile 0m 32s the patch passed
          +1 javac 0m 32s the patch passed
          -1 checkstyle 0m 21s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 69 new + 178 unchanged - 30 fixed = 247 total (was 208)
          +1 mvnsite 0m 38s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          -1 findbugs 1m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
          -1 javadoc 0m 21s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 unit 39m 5s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          53m 53s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 43-53]
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 57-67]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833185/YARN-2009.0006.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ea0388f75bb8 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 332a61f
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13380/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/13380/artifact/patchprocess/whitespace-eol.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13380/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13380/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13380/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13380/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 38s trunk passed +1 compile 0m 31s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 57s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 32s the patch passed +1 compile 0m 32s the patch passed +1 javac 0m 32s the patch passed -1 checkstyle 0m 21s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 69 new + 178 unchanged - 30 fixed = 247 total (was 208) +1 mvnsite 0m 38s the patch passed +1 mvneclipse 0m 15s the patch passed -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. -1 findbugs 1m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) -1 javadoc 0m 21s hadoop-yarn-server-resourcemanager in the patch failed. +1 unit 39m 5s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 53m 53s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 43-53]   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 57-67] Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833185/YARN-2009.0006.patch JIRA Issue YARN-2009 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ea0388f75bb8 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 332a61f Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13380/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/13380/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13380/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13380/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13380/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13380/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Yes Eric Payne. Thats perfectly correct. Updated new patch where we can a clone of the Resources.min(..).

          Show
          sunilg Sunil G added a comment - Yes Eric Payne . Thats perfectly correct. Updated new patch where we can a clone of the Resources.min(..).
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 55s trunk passed
          +1 compile 0m 31s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 37s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 0m 59s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 70 new + 178 unchanged - 30 fixed = 248 total (was 208)
          +1 mvnsite 0m 34s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 findbugs 1m 5s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
          -1 javadoc 0m 19s hadoop-yarn-server-resourcemanager in the patch failed.
          -1 unit 38m 41s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          53m 26s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 43-53]
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 57-67]
          Failed junit tests hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyIntraQueue



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833292/YARN-2009.0007.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e5cbaa1382e2 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0a85d07
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13390/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13390/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13390/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13390/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13390/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13390/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13390/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 55s trunk passed +1 compile 0m 31s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 37s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 59s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 70 new + 178 unchanged - 30 fixed = 248 total (was 208) +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 1m 5s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) -1 javadoc 0m 19s hadoop-yarn-server-resourcemanager in the patch failed. -1 unit 38m 41s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 53m 26s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 43-53]   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 57-67] Failed junit tests hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyIntraQueue Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833292/YARN-2009.0007.patch JIRA Issue YARN-2009 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e5cbaa1382e2 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0a85d07 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13390/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13390/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13390/artifact/patchprocess/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/13390/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13390/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13390/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13390/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          eepayne Eric Payne added a comment -

          Sunil G, I have one concern about how TempAppPerPartition#toBePreempted is calculated. Please consider the use case of where one low priority app is taking up all resources and a second high priority app is trying to get started:

          NOTE; I am ignoring userLimitResource and idealAssignedForUser because in this use case there is only 1 user. Ignoring app.selected as well.

          FifoIntraQueuePreemptionPlugin simplified pseudo code
          calculateIdealAssignedResourcePerApp:
            for each app {
              app.idealAssigned = min((app.totalUsed + app.pending), queueTotalUnassigned)
            }
          
          ...
          
          calculateToBePreemptedResourcePerApp:
            for each app {
              app.toBePreempted = app.totalUsed - app.idealAssigned - app.AMUsed
            }
          

          For this use case, let's consider that all containers, including the AMs, will be 512 MB. app1 is running, consuming all resources in the queue. The RM creates app2 and sets its pending value to be 512 MB (for the AM).

          Intra-queue preemption does the following during each round:

          AppID priority totalUsed MB AMUsed MB pending MB queueTotalUnassigned MB
          2 2 0 0 512 12288
          FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp
          app2.idealAssigned = min((app2.totalUsed + app2.pending), queueTotalUnassigned)
                             = min((0 + 512), 12288)
                             = 512
          queueTotalUnassigned = queueTotalUnassigned - app2.idealAssigned
                               = 12288 - 512
                               = 11776
          
          AppID priority totalUsed MB AMUsed MB pending MB queueTotalUnassigned MB
          1 1 12288 512 3584 11776
          FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp
          app1.idealAssigned = min((app1.totalUsed + app1.pending), queueTotalUnassigned)
                             = min((12288 + 3584), 11776)
                             = 11776
          queueTotalUnassigned = queueTotalUnassigned - app2.idealAssigned
                               = 11776 - 11776
                               = 0
          
          FifoIntraQueuePreemptionPlugin#calculateToBePreemptedResourcePerApp
          app1.toBePreempted = app1.totalUsed - app1.idealAssigned - app1.AMUsed
                             = 12288 - 11776 - 512
                             = 0
          

          In this case, app1.toBePreempted should be 512 instead of 0. Since it remains 0, no container ever gets preempted and app2 can never get it's AM container.

          I'm not yet ready to suggest a solution, but I wanted to point this out while we work on it.

          Show
          eepayne Eric Payne added a comment - Sunil G , I have one concern about how TempAppPerPartition#toBePreempted is calculated. Please consider the use case of where one low priority app is taking up all resources and a second high priority app is trying to get started: NOTE; I am ignoring userLimitResource and idealAssignedForUser because in this use case there is only 1 user. Ignoring app.selected as well. FifoIntraQueuePreemptionPlugin simplified pseudo code calculateIdealAssignedResourcePerApp: for each app { app.idealAssigned = min((app.totalUsed + app.pending), queueTotalUnassigned) } ... calculateToBePreemptedResourcePerApp: for each app { app.toBePreempted = app.totalUsed - app.idealAssigned - app.AMUsed } For this use case, let's consider that all containers, including the AMs, will be 512 MB. app1 is running, consuming all resources in the queue. The RM creates app2 and sets its pending value to be 512 MB (for the AM). Intra-queue preemption does the following during each round: AppID priority totalUsed MB AMUsed MB pending MB queueTotalUnassigned MB 2 2 0 0 512 12288 FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp app2.idealAssigned = min((app2.totalUsed + app2.pending), queueTotalUnassigned) = min((0 + 512), 12288) = 512 queueTotalUnassigned = queueTotalUnassigned - app2.idealAssigned = 12288 - 512 = 11776 AppID priority totalUsed MB AMUsed MB pending MB queueTotalUnassigned MB 1 1 12288 512 3584 11776 FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp app1.idealAssigned = min((app1.totalUsed + app1.pending), queueTotalUnassigned) = min((12288 + 3584), 11776) = 11776 queueTotalUnassigned = queueTotalUnassigned - app2.idealAssigned = 11776 - 11776 = 0 FifoIntraQueuePreemptionPlugin#calculateToBePreemptedResourcePerApp app1.toBePreempted = app1.totalUsed - app1.idealAssigned - app1.AMUsed = 12288 - 11776 - 512 = 0 In this case, app1.toBePreempted should be 512 instead of 0. Since it remains 0, no container ever gets preempted and app2 can never get it's AM container. I'm not yet ready to suggest a solution, but I wanted to point this out while we work on it.
          Hide
          eepayne Eric Payne added a comment -

          Hi Sunil G. I think I would suggest to do the following:

          current FifoIntraQueuePreemptionPlugin#calculateToBePreemptedResourcePerApp
                Resources.subtractFrom(preemtableFromApp, tmpApp.getAMUsed());
          
          suggested FifoIntraQueuePreemptionPlugin#calculateToBePreemptedResourcePerApp
                if (Resources.lessThan(rc, clusterResource,
                      Resources.subtract(tmpApp.getUsed(), preemtableFromApp),
                      tmpApp.getAMUsed())) {
                  Resources.subtractFrom(preemtableFromApp, tmpApp.getAMUsed());
                }
          
          Show
          eepayne Eric Payne added a comment - Hi Sunil G . I think I would suggest to do the following: current FifoIntraQueuePreemptionPlugin#calculateToBePreemptedResourcePerApp Resources.subtractFrom(preemtableFromApp, tmpApp.getAMUsed()); suggested FifoIntraQueuePreemptionPlugin#calculateToBePreemptedResourcePerApp if (Resources.lessThan(rc, clusterResource, Resources.subtract(tmpApp.getUsed(), preemtableFromApp), tmpApp.getAMUsed())) { Resources.subtractFrom(preemtableFromApp, tmpApp.getAMUsed()); }
          Hide
          sunilg Sunil G added a comment - - edited

          Thanks Eric Payne. You are correct.
          I also ran into a similar point yesterday and found root cause is because we subtract tmpApp.getAMUsed() in all cases. Ideally we must deduct it only when one app's resources are fully needed for preemption demand from other high apps.

          I found a solution to have something similar to

                if (Resources.lessThan(rc, clusterResource,
                      Resources.subtract(tmpApp.getUsed(), preemtableFromApp),
                      tmpApp.getAMUsed())) {
                  Resources.subtractFrom(preemtableFromApp, tmpApp.getAMUsed());
                }
          

          I think this can be placed in FifoIntraQueuePreemptionPlugin.validateOutSameAppPriorityFromDemand, so we can ensure that we will deduct AMUsed only when one app's resource is needed fully for preemption. Else we may not needed to consider the same. I am preparing for UTs also to cover this cases.

          Show
          sunilg Sunil G added a comment - - edited Thanks Eric Payne . You are correct. I also ran into a similar point yesterday and found root cause is because we subtract tmpApp.getAMUsed() in all cases. Ideally we must deduct it only when one app's resources are fully needed for preemption demand from other high apps. I found a solution to have something similar to if (Resources.lessThan(rc, clusterResource, Resources.subtract(tmpApp.getUsed(), preemtableFromApp), tmpApp.getAMUsed())) { Resources.subtractFrom(preemtableFromApp, tmpApp.getAMUsed()); } I think this can be placed in FifoIntraQueuePreemptionPlugin.validateOutSameAppPriorityFromDemand , so we can ensure that we will deduct AMUsed only when one app's resource is needed fully for preemption. Else we may not needed to consider the same. I am preparing for UTs also to cover this cases.
          Hide
          sunilg Sunil G added a comment -

          Updating new patch with jenkin fix and AM used scenario.

          I added the check mentioned by Eric Payne in calculateToBePreemptedResourcePerApp, but also added back these saved resources as pre-emptable resource for next higher priority apps to balance out.. Added various tests to cover the same.

          Show
          sunilg Sunil G added a comment - Updating new patch with jenkin fix and AM used scenario. I added the check mentioned by Eric Payne in calculateToBePreemptedResourcePerApp , but also added back these saved resources as pre-emptable resource for next higher priority apps to balance out.. Added various tests to cover the same.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks for updating the patch, Sunil G and thanks thorough review suggestions from Eric Payne!

          For the AM resource usages while doing intra-queue preemption, as of now, we don't consider preempt any AM. in another word, these resources are "frozen".

          To simply the problem, instead of creating another pool for AM resource, I would prefer to deduct them in all calculations.

          There're a few of calculations we should look at:

          • totalPreemptedResourceAllowed should deduct total AM usages from the queue. (ResourceUsage.getAMUsed of queue may not correct since we inc am-used before allocating the container, it's better to add all AM resource from running apps.)
          • preemtableFromApp should deduct AM usages from each app

          I think the solution should work, let me know if I missed anything.

          Show
          leftnoteasy Wangda Tan added a comment - Thanks for updating the patch, Sunil G and thanks thorough review suggestions from Eric Payne ! For the AM resource usages while doing intra-queue preemption, as of now, we don't consider preempt any AM. in another word, these resources are "frozen". To simply the problem, instead of creating another pool for AM resource, I would prefer to deduct them in all calculations. There're a few of calculations we should look at: totalPreemptedResourceAllowed should deduct total AM usages from the queue. (ResourceUsage.getAMUsed of queue may not correct since we inc am-used before allocating the container, it's better to add all AM resource from running apps.) preemtableFromApp should deduct AM usages from each app I think the solution should work, let me know if I missed anything.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 10s Maven dependency ordering for branch
          +1 mvninstall 6m 35s trunk passed
          +1 compile 2m 17s trunk passed
          +1 checkstyle 0m 39s trunk passed
          +1 mvnsite 3m 32s trunk passed
          +1 mvneclipse 0m 47s trunk passed
          0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn
          +1 findbugs 0m 55s trunk passed
          +1 javadoc 1m 45s trunk passed
          0 mvndep 0m 9s Maven dependency ordering for patch
          +1 mvninstall 2m 48s the patch passed
          +1 compile 2m 15s the patch passed
          +1 javac 2m 15s the patch passed
          -1 checkstyle 0m 39s hadoop-yarn-project/hadoop-yarn: The patch generated 41 new + 178 unchanged - 30 fixed = 219 total (was 208)
          +1 mvnsite 3m 29s the patch passed
          +1 mvneclipse 0m 44s the patch passed
          -1 whitespace 0m 0s The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn
          -1 findbugs 1m 5s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
          -1 javadoc 1m 22s hadoop-yarn-project_hadoop-yarn generated 2 new + 6496 unchanged - 1 fixed = 6498 total (was 6497)
          -1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 2 new + 937 unchanged - 1 fixed = 939 total (was 938)
          -1 unit 21m 23s hadoop-yarn in the patch failed.
          -1 unit 18m 51s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          71m 1s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 44-54]
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 58-68]
          Failed junit tests hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesReservation
            hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesReservation



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833791/YARN-2009.0008.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense findbugs xml compile javac javadoc mvninstall mvnsite unit checkstyle
          uname Linux 2b4360fb615b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ed9fcbe
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/whitespace-eol.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn.txt
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13413/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13413/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 10s Maven dependency ordering for branch +1 mvninstall 6m 35s trunk passed +1 compile 2m 17s trunk passed +1 checkstyle 0m 39s trunk passed +1 mvnsite 3m 32s trunk passed +1 mvneclipse 0m 47s trunk passed 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn +1 findbugs 0m 55s trunk passed +1 javadoc 1m 45s trunk passed 0 mvndep 0m 9s Maven dependency ordering for patch +1 mvninstall 2m 48s the patch passed +1 compile 2m 15s the patch passed +1 javac 2m 15s the patch passed -1 checkstyle 0m 39s hadoop-yarn-project/hadoop-yarn: The patch generated 41 new + 178 unchanged - 30 fixed = 219 total (was 208) +1 mvnsite 3m 29s the patch passed +1 mvneclipse 0m 44s the patch passed -1 whitespace 0m 0s The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 xml 0m 1s The patch has no ill-formed XML file. 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn -1 findbugs 1m 5s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) -1 javadoc 1m 22s hadoop-yarn-project_hadoop-yarn generated 2 new + 6496 unchanged - 1 fixed = 6498 total (was 6497) -1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 2 new + 937 unchanged - 1 fixed = 939 total (was 938) -1 unit 21m 23s hadoop-yarn in the patch failed. -1 unit 18m 51s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 71m 1s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 44-54]   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 58-68] Failed junit tests hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesReservation   hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesReservation Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833791/YARN-2009.0008.patch JIRA Issue YARN-2009 Optional Tests asflicense findbugs xml compile javac javadoc mvninstall mvnsite unit checkstyle uname Linux 2b4360fb615b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ed9fcbe Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn.txt javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt https://builds.apache.org/job/PreCommit-YARN-Build/13413/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13413/testReport/ modules C: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/13413/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          eepayne Eric Payne added a comment -

          Thanks for the updated patch Sunil G.

          Should this feature work with labelled queues? I notice that it does not. I think it is because the following code is getting the used capacity only for the default partition:

          IntraQueueCandidatesSelector#computeIntraQueuePreemptionDemand
                  if (leafQueue.getUsedCapacity() < context
                      .getMinimumThresholdForIntraQueuePreemption()) {
                    continue;
                  }
          

          The above code has access to the partition, so it should be easy to get the used capacity per partition. Perhaps something like the following:

          IntraQueueCandidatesSelector#computeIntraQueuePreemptionDemand
                  if (leafQueue.getQueueCapacities().getUsedCapacity(partition) < context
                      .getMinimumThresholdForIntraQueuePreemption()) {
                    continue;
                  }
          
          Show
          eepayne Eric Payne added a comment - Thanks for the updated patch Sunil G . Should this feature work with labelled queues? I notice that it does not. I think it is because the following code is getting the used capacity only for the default partition: IntraQueueCandidatesSelector#computeIntraQueuePreemptionDemand if (leafQueue.getUsedCapacity() < context .getMinimumThresholdForIntraQueuePreemption()) { continue ; } The above code has access to the partition, so it should be easy to get the used capacity per partition. Perhaps something like the following: IntraQueueCandidatesSelector#computeIntraQueuePreemptionDemand if (leafQueue.getQueueCapacities().getUsedCapacity(partition) < context .getMinimumThresholdForIntraQueuePreemption()) { continue ; }
          Hide
          sunilg Sunil G added a comment -

          Thank you Eric Payne. Nice catch. I missed it in queue level. Also I tried deduct AM used in earlier phase rather than adding back to idealAssigned for other apps. I think logic became simple now.

          Also tested in a real cluster and saw preemption is kicking in. tried to fix some more jenkins pblms.

          Show
          sunilg Sunil G added a comment - Thank you Eric Payne . Nice catch. I missed it in queue level. Also I tried deduct AM used in earlier phase rather than adding back to idealAssigned for other apps. I think logic became simple now. Also tested in a real cluster and saw preemption is kicking in. tried to fix some more jenkins pblms.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 13s Maven dependency ordering for branch
          +1 mvninstall 6m 45s trunk passed
          +1 compile 2m 18s trunk passed
          +1 checkstyle 0m 40s trunk passed
          +1 mvnsite 3m 38s trunk passed
          +1 mvneclipse 0m 46s trunk passed
          0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn
          +1 findbugs 0m 58s trunk passed
          +1 javadoc 1m 42s trunk passed
          0 mvndep 0m 8s Maven dependency ordering for patch
          +1 mvninstall 2m 47s the patch passed
          +1 compile 2m 14s the patch passed
          +1 javac 2m 14s the patch passed
          -1 checkstyle 0m 40s hadoop-yarn-project/hadoop-yarn: The patch generated 41 new + 211 unchanged - 31 fixed = 252 total (was 242)
          +1 mvnsite 3m 29s the patch passed
          +1 mvneclipse 0m 43s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 2s The patch has no ill-formed XML file.
          0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn
          -1 findbugs 1m 5s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
          -1 javadoc 1m 21s hadoop-yarn-project_hadoop-yarn generated 2 new + 6493 unchanged - 1 fixed = 6495 total (was 6494)
          -1 javadoc 0m 19s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 2 new + 937 unchanged - 1 fixed = 939 total (was 938)
          -1 unit 20m 44s hadoop-yarn in the patch failed.
          +1 unit 35m 56s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          87m 51s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 47-57]
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 61-71]
          Failed junit tests hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833979/YARN-2009.0009.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense findbugs xml compile javac javadoc mvninstall mvnsite unit checkstyle
          uname Linux 4180e4f83641 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d26a1bb
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn.txt
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13421/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13421/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 13s Maven dependency ordering for branch +1 mvninstall 6m 45s trunk passed +1 compile 2m 18s trunk passed +1 checkstyle 0m 40s trunk passed +1 mvnsite 3m 38s trunk passed +1 mvneclipse 0m 46s trunk passed 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn +1 findbugs 0m 58s trunk passed +1 javadoc 1m 42s trunk passed 0 mvndep 0m 8s Maven dependency ordering for patch +1 mvninstall 2m 47s the patch passed +1 compile 2m 14s the patch passed +1 javac 2m 14s the patch passed -1 checkstyle 0m 40s hadoop-yarn-project/hadoop-yarn: The patch generated 41 new + 211 unchanged - 31 fixed = 252 total (was 242) +1 mvnsite 3m 29s the patch passed +1 mvneclipse 0m 43s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 2s The patch has no ill-formed XML file. 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn -1 findbugs 1m 5s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) -1 javadoc 1m 21s hadoop-yarn-project_hadoop-yarn generated 2 new + 6493 unchanged - 1 fixed = 6495 total (was 6494) -1 javadoc 0m 19s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 2 new + 937 unchanged - 1 fixed = 939 total (was 938) -1 unit 20m 44s hadoop-yarn in the patch failed. +1 unit 35m 56s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 87m 51s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 47-57]   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 61-71] Failed junit tests hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833979/YARN-2009.0009.patch JIRA Issue YARN-2009 Optional Tests asflicense findbugs xml compile javac javadoc mvninstall mvnsite unit checkstyle uname Linux 4180e4f83641 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d26a1bb Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn.txt javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13421/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13421/testReport/ modules C: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/13421/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Attaching a new patch in which I am trying to avoid the new api which was added to FicaSchedulerApp to get real am used resources. This can be handled in preemption policy itself if we consider only running apps. Eric Payne Wangda Tan please help to review.

          Show
          sunilg Sunil G added a comment - Attaching a new patch in which I am trying to avoid the new api which was added to FicaSchedulerApp to get real am used resources. This can be handled in preemption policy itself if we consider only running apps. Eric Payne Wangda Tan please help to review.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 9s Maven dependency ordering for branch
          +1 mvninstall 6m 41s trunk passed
          +1 compile 2m 17s trunk passed
          +1 checkstyle 0m 40s trunk passed
          +1 mvnsite 3m 32s trunk passed
          +1 mvneclipse 0m 47s trunk passed
          0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn
          +1 findbugs 0m 56s trunk passed
          +1 javadoc 1m 43s trunk passed
          0 mvndep 0m 9s Maven dependency ordering for patch
          +1 mvninstall 2m 51s the patch passed
          +1 compile 2m 16s the patch passed
          +1 javac 2m 16s the patch passed
          -1 checkstyle 0m 38s hadoop-yarn-project/hadoop-yarn: The patch generated 25 new + 175 unchanged - 33 fixed = 200 total (was 208)
          +1 mvnsite 3m 29s the patch passed
          +1 mvneclipse 0m 43s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn
          -1 findbugs 1m 5s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
          +1 javadoc 1m 22s hadoop-yarn-project_hadoop-yarn generated 0 new + 6493 unchanged - 1 fixed = 6493 total (was 6494)
          +1 javadoc 0m 19s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938)
          -1 unit 25m 27s hadoop-yarn in the patch failed.
          +1 unit 38m 42s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          95m 6s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 47-57]
            org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java:[lines 61-71]
          Failed junit tests hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices
            hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834182/YARN-2009.0010.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense findbugs xml compile javac javadoc mvninstall mvnsite unit checkstyle
          uname Linux b08550927f3e 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / c5573e6
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13440/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13440/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13440/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13440/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13440/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13440/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 9s Maven dependency ordering for branch +1 mvninstall 6m 41s trunk passed +1 compile 2m 17s trunk passed +1 checkstyle 0m 40s trunk passed +1 mvnsite 3m 32s trunk passed +1 mvneclipse 0m 47s trunk passed 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn +1 findbugs 0m 56s trunk passed +1 javadoc 1m 43s trunk passed 0 mvndep 0m 9s Maven dependency ordering for patch +1 mvninstall 2m 51s the patch passed +1 compile 2m 16s the patch passed +1 javac 2m 16s the patch passed -1 checkstyle 0m 38s hadoop-yarn-project/hadoop-yarn: The patch generated 25 new + 175 unchanged - 33 fixed = 200 total (was 208) +1 mvnsite 3m 29s the patch passed +1 mvneclipse 0m 43s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn -1 findbugs 1m 5s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) +1 javadoc 1m 22s hadoop-yarn-project_hadoop-yarn generated 0 new + 6493 unchanged - 1 fixed = 6493 total (was 6494) +1 javadoc 0m 19s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938) -1 unit 25m 27s hadoop-yarn in the patch failed. +1 unit 38m 42s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 95m 6s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAPriorityComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 47-57]   org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector$TAReverseComparator implements Comparator but not Serializable At IntraQueueCandidatesSelector.java:Serializable At IntraQueueCandidatesSelector.java: [lines 61-71] Failed junit tests hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices   hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834182/YARN-2009.0010.patch JIRA Issue YARN-2009 Optional Tests asflicense findbugs xml compile javac javadoc mvninstall mvnsite unit checkstyle uname Linux b08550927f3e 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c5573e6 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13440/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13440/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html unit https://builds.apache.org/job/PreCommit-YARN-Build/13440/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13440/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13440/testReport/ modules C: hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/13440/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Fixed findbugs warnings..

          Show
          sunilg Sunil G added a comment - Fixed findbugs warnings..
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 53s trunk passed
          +1 compile 0m 33s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 0m 56s trunk passed
          +1 javadoc 0m 21s trunk passed
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 21 new + 177 unchanged - 32 fixed = 198 total (was 209)
          +1 mvnsite 0m 35s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 4s the patch passed
          +1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938)
          +1 unit 35m 14s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          49m 58s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834224/YARN-2009.0011.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5a370000ba8c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / e9c4616
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13443/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13443/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13443/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 53s trunk passed +1 compile 0m 33s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 56s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 21 new + 177 unchanged - 32 fixed = 198 total (was 209) +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 4s the patch passed +1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938) +1 unit 35m 14s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 49m 58s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834224/YARN-2009.0011.patch JIRA Issue YARN-2009 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5a370000ba8c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e9c4616 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13443/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13443/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13443/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Latest patch looks good, thanks Sunil G!

          Any other comments for the latest patch? Eric Payne

          Show
          leftnoteasy Wangda Tan added a comment - Latest patch looks good, thanks Sunil G ! Any other comments for the latest patch? Eric Payne
          Hide
          eepayne Eric Payne added a comment -

          Sunil G, thanks for the updated patch.

          There is a problem with preemption when multiple users are in the same queue. Consider the following use case:

          • user1 starts app1 at priority 1 and consumes the entire queue with long-running containers. app1 has many pending resources.
          • user2 starts app2 at priority 2. app2 has many pending resources.
          • Intra-queue preemption monitor preempts containers from app1 until both app1 and app2 have equal resources.

          However, the the intra-queue preemption monitor doesn't stop there. It continues to preempt containers from app1, which are given back to app1 by the scheduler.

          Show
          eepayne Eric Payne added a comment - Sunil G , thanks for the updated patch. There is a problem with preemption when multiple users are in the same queue. Consider the following use case: user1 starts app1 at priority 1 and consumes the entire queue with long-running containers. app1 has many pending resources. user2 starts app2 at priority 2. app2 has many pending resources. Intra-queue preemption monitor preempts containers from app1 until both app1 and app2 have equal resources. However, the the intra-queue preemption monitor doesn't stop there. It continues to preempt containers from app1 , which are given back to app1 by the scheduler.
          Hide
          sunilg Sunil G added a comment -

          Hi Eric Payne
          Could you please let me know the user-limit which is used for this case?

          Show
          sunilg Sunil G added a comment - Hi Eric Payne Could you please let me know the user-limit which is used for this case?
          Hide
          sunilg Sunil G added a comment - - edited

          I tested below case

              String queuesConfig =
                  // guaranteed,max,used,pending,reserved
                  "root(=[100 100 55 170 0]);" + // root
                      "-a(=[40 100 10 50 0]);" + // a
                      "-b(=[60 100 45 120 0])"; // b
          
              String appsConfig =
              // queueName\t(priority,resource,host,expression,#repeat,reserved,pending)
                  "a\t" // app1 in a
                      + "(1,1,n1,,5,false,25, user);" + // app1 a
                      "a\t" // app2 in a
                      + "(2,1,n1,,5,false,25, user);" + // app2 a
                      "b\t" // app3 in b
                      + "(4,1,n1,,40,false,20,user1);" + // app3 b
                      "b\t" // app1 in a
                      + "(6,1,n1,,5,false,30,user2)";
          
              verify(mDisp, times(16)).handle(argThat(
                  new TestProportionalCapacityPreemptionPolicy.IsPreemptionRequestFor(
                      getAppAttemptId(3))));
              verify(mDisp, never()).handle(argThat(
                  new TestProportionalCapacityPreemptionPolicy.IsPreemptionRequestFor(
                      getAppAttemptId(4))));
          

          Here queue 'b' has 3 users and 60% is the cap for this queue. So i am considering 20GB for each user (33.33 userlimit).

          Given 20GB, preemption is happening for 16GB considering 4GB is already running for app3. But I am not seeing any extra preemption from same level app. May be could u share some more details.

          Show
          sunilg Sunil G added a comment - - edited I tested below case String queuesConfig = // guaranteed,max,used,pending,reserved "root(=[100 100 55 170 0]);" + // root "-a(=[40 100 10 50 0]);" + // a "-b(=[60 100 45 120 0])"; // b String appsConfig = // queueName\t(priority,resource,host,expression,#repeat,reserved,pending) "a\t" // app1 in a + "(1,1,n1,,5,false,25, user);" + // app1 a "a\t" // app2 in a + "(2,1,n1,,5,false,25, user);" + // app2 a "b\t" // app3 in b + "(4,1,n1,,40,false,20,user1);" + // app3 b "b\t" // app1 in a + "(6,1,n1,,5,false,30,user2)"; verify(mDisp, times(16)).handle(argThat( new TestProportionalCapacityPreemptionPolicy.IsPreemptionRequestFor( getAppAttemptId(3)))); verify(mDisp, never()).handle(argThat( new TestProportionalCapacityPreemptionPolicy.IsPreemptionRequestFor( getAppAttemptId(4)))); Here queue 'b' has 3 users and 60% is the cap for this queue. So i am considering 20GB for each user (33.33 userlimit). Given 20GB, preemption is happening for 16GB considering 4GB is already running for app3. But I am not seeing any extra preemption from same level app. May be could u share some more details.
          Hide
          eepayne Eric Payne added a comment -

          Hi Sunil G. Here is a description of my test environment, the steps I executed, and the results I am seeing

          I don't know why the unit test you described above is not catching this, but I will continue to investigate. In the meantime, can you please try the following and let me know what you discover?


          Property Name Property Value
          monitoring_interval (ms) 1000
          max_wait_before_kill (ms) 500
          total_preemption_per_round 1.0
          max_ignored_over_capacity 0.2
          select_based_on_reserved_containers true
          natural_termination_factor 2.0
          intra-queue-preemption.enabled true
          intra-queue-preemption.minimum-threshold 0.5
          intra-queue-preemption.max-allowable-limit 0.1
          Cluster
          Nodes: 3
          Mem per node: 4 GB
          Total Cluster Size: 12 GB
          Container size: 0.5 GB
          
          Queue Guarantee Max Minimum user limit percent User Limit Factor
          root 100% (12 GB) 100% (12 GB) N/A N/A
          default 50% (6 GB) 100% (12 GB) 50% (2 users can run in queue simultaneously) 2.0 (one user can consume twice the queue's Guarantee
          eng 50% (6 GB) 100% (12 GB) 50% (2 users can run in queue simultaneously) 2.0 (one user can consume twice the queue's Guarantee
          • user1 starts app1 at priority 1 in the default queue, and requests 30 mappers which want to run for 10 minutes each:
            • Sleep job: -m 30 -mt 600000
            • Total requested resources are 15.5 GB: ((30 map containers * 0.5 GB per container) + 0.5 GB AM container))
              App Name User Name Priority Used Pending
              app1 user1 1 0 15.5 GB
          • The RM assigns app1 24 containers, consuming 12 GB (all cluster resources):
            • (23 mappers * 0.5 GB) + 0.5 GB AM = 12 GB
              App Name User Name Priority Used Pending
              app1 user1 1 12 GB 3.5 GB
          • user2 starts app2 at priority 2 in the default queue, and requests 30 mappers which want to run for 10 minutes each:
            App Name User Name Priority Used Pending
            app1 user1 1 12 GB 3.5 GB
            app2 user2 2 0 15.5 GB
          • The intra-queue preemption monitor iterates over the containers for several monitoring_interval's and preempts 12 containers (6 GB resources)
          • The RM assigns the the preempted containers to app2
            App Name User Name Priority Used Pending
            app1 user1 1 6 GB 3.5 GB
            app2 user2 2 6 GB 3.5 GB
          • The intra-queue preemption monitor continues to preempt containers from app1.
            • However, since the MULP for the default queue should be 6GB, the RM gives the preempted containers back to app1
            • This repeats indefinitely.
          Show
          eepayne Eric Payne added a comment - Hi Sunil G . Here is a description of my test environment, the steps I executed, and the results I am seeing I don't know why the unit test you described above is not catching this, but I will continue to investigate. In the meantime, can you please try the following and let me know what you discover? Property Name Property Value monitoring_interval (ms) 1000 max_wait_before_kill (ms) 500 total_preemption_per_round 1.0 max_ignored_over_capacity 0.2 select_based_on_reserved_containers true natural_termination_factor 2.0 intra-queue-preemption.enabled true intra-queue-preemption.minimum-threshold 0.5 intra-queue-preemption.max-allowable-limit 0.1 Cluster Nodes: 3 Mem per node: 4 GB Total Cluster Size: 12 GB Container size: 0.5 GB Queue Guarantee Max Minimum user limit percent User Limit Factor root 100% (12 GB) 100% (12 GB) N/A N/A default 50% (6 GB) 100% (12 GB) 50% (2 users can run in queue simultaneously) 2.0 (one user can consume twice the queue's Guarantee eng 50% (6 GB) 100% (12 GB) 50% (2 users can run in queue simultaneously) 2.0 (one user can consume twice the queue's Guarantee user1 starts app1 at priority 1 in the default queue, and requests 30 mappers which want to run for 10 minutes each: Sleep job: -m 30 -mt 600000 Total requested resources are 15.5 GB: ((30 map containers * 0.5 GB per container) + 0.5 GB AM container)) App Name User Name Priority Used Pending app1 user1 1 0 15.5 GB The RM assigns app1 24 containers, consuming 12 GB (all cluster resources): (23 mappers * 0.5 GB) + 0.5 GB AM = 12 GB App Name User Name Priority Used Pending app1 user1 1 12 GB 3.5 GB user2 starts app2 at priority 2 in the default queue, and requests 30 mappers which want to run for 10 minutes each: App Name User Name Priority Used Pending app1 user1 1 12 GB 3.5 GB app2 user2 2 0 15.5 GB The intra-queue preemption monitor iterates over the containers for several monitoring_interval 's and preempts 12 containers (6 GB resources) The RM assigns the the preempted containers to app2 App Name User Name Priority Used Pending app1 user1 1 6 GB 3.5 GB app2 user2 2 6 GB 3.5 GB The intra-queue preemption monitor continues to preempt containers from app1 . However, since the MULP for the default queue should be 6GB, the RM gives the preempted containers back to app1 This repeats indefinitely.
          Hide
          eepayne Eric Payne added a comment -

          Hi Sunil G. I am confused by something you said in the comment above:

          I tested below case

          ...
                      "b\t" // app3 in b
                      + "(4,1,n1,,40,false,20,_user1_);" + // app3 b
                      "b\t" // app1 in a
                      + "(6,1,n1,,5,false,30,_user2_)";
          ...
          

          I assumed that the above was from a unit test. As far as I can tell, nothing in the o.a.h.y.s.r.monitor.capacity framework supports testing with different users. Were you using the above code as pseudocode to document a manual test?

          Show
          eepayne Eric Payne added a comment - Hi Sunil G . I am confused by something you said in the comment above : I tested below case ... "b\t" // app3 in b + "(4,1,n1,,40, false ,20,_user1_);" + // app3 b "b\t" // app1 in a + "(6,1,n1,,5, false ,30,_user2_)" ; ... I assumed that the above was from a unit test. As far as I can tell, nothing in the o.a.h.y.s.r.monitor.capacity framework supports testing with different users. Were you using the above code as pseudocode to document a manual test?
          Hide
          sunilg Sunil G added a comment -

          Thanks Eric Payne for sharing detailed test scenario.

          I tested manually also. But the test code which i posted was from my unit test. I will try mock the case as you mentioned and will update the cause for the same. Thank You.

          Show
          sunilg Sunil G added a comment - Thanks Eric Payne for sharing detailed test scenario. I tested manually also. But the test code which i posted was from my unit test. I will try mock the case as you mentioned and will update the cause for the same. Thank You.
          Hide
          eepayne Eric Payne added a comment -

          the test code which i posted was from my unit test

          Thanks Sunil G. I cannot find anywhere in the capacity preemption tests where it takes different user IDs as parameters. I don't see that in the YARN-2009.0011.patch either. Can you please help me understand what I'm missing?

          Show
          eepayne Eric Payne added a comment - the test code which i posted was from my unit test Thanks Sunil G . I cannot find anywhere in the capacity preemption tests where it takes different user IDs as parameters. I don't see that in the YARN-2009 .0011.patch either. Can you please help me understand what I'm missing?
          Hide
          sunilg Sunil G added a comment -

          Eric Payne, I have added a unit test case which was not added here. I ll upload new patch with that test case..

          Show
          sunilg Sunil G added a comment - Eric Payne , I have added a unit test case which was not added here. I ll upload new patch with that test case..
          Hide
          sunilg Sunil G added a comment -

          New test case name is testPreemptionWithTwoUsers

          I have mocked the user-limit by below equation. (a very simple calculation)

          Defining 2 users, consider user-limit as 50%.
          So we can consider user-limit as guaranteed * 0.5. (Since only 2 users are there).

          Meantime I am testing in a real cluster, and i ll update my analysis soon.

          Show
          sunilg Sunil G added a comment - New test case name is testPreemptionWithTwoUsers I have mocked the user-limit by below equation. (a very simple calculation) Defining 2 users, consider user-limit as 50%. So we can consider user-limit as guaranteed * 0.5 . (Since only 2 users are there). Meantime I am testing in a real cluster, and i ll update my analysis soon.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 7m 18s trunk passed
          +1 compile 0m 31s trunk passed
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 37s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 0m 56s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 22 new + 175 unchanged - 33 fixed = 197 total (was 208)
          +1 mvnsite 0m 34s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 2s the patch passed
          +1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938)
          -1 unit 38m 46s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          53m 45s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyIntraQueue



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834656/YARN-2009.0012.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5cf76f590758 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / f63cd78
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13465/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13465/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13465/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13465/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13465/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 18s trunk passed +1 compile 0m 31s trunk passed +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 37s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 56s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 22 new + 175 unchanged - 33 fixed = 197 total (was 208) +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 2s the patch passed +1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938) -1 unit 38m 46s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 53m 45s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyIntraQueue Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834656/YARN-2009.0012.patch JIRA Issue YARN-2009 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5cf76f590758 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / f63cd78 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13465/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/13465/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13465/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13465/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13465/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          New test case has an issue. Fixed.

          Show
          sunilg Sunil G added a comment - New test case has an issue. Fixed.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 43s trunk passed
          +1 compile 0m 33s trunk passed
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 37s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 0m 55s trunk passed
          +1 javadoc 0m 21s trunk passed
          +1 mvninstall 0m 36s the patch passed
          +1 compile 0m 38s the patch passed
          +1 javac 0m 38s the patch passed
          -1 checkstyle 0m 23s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 24 new + 175 unchanged - 33 fixed = 199 total (was 208)
          +1 mvnsite 0m 43s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 18s the patch passed
          +1 javadoc 0m 22s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938)
          +1 unit 40m 17s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          55m 39s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834718/YARN-2009.0013.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e15be54c2f09 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / f63cd78
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13466/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13466/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13466/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 43s trunk passed +1 compile 0m 33s trunk passed +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 37s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 55s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 36s the patch passed +1 compile 0m 38s the patch passed +1 javac 0m 38s the patch passed -1 checkstyle 0m 23s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 24 new + 175 unchanged - 33 fixed = 199 total (was 208) +1 mvnsite 0m 43s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 18s the patch passed +1 javadoc 0m 22s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938) +1 unit 40m 17s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 55m 39s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834718/YARN-2009.0013.patch JIRA Issue YARN-2009 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e15be54c2f09 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / f63cd78 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13466/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13466/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13466/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Thank you very Eric Payne for uncovering user-limit scenario. Highly appreciate for the same.

          Root cause of the issue is because user-limit was included with real AM used. We were internally avoiding this in all places. I have made the fix and tested in a real cluster.
          Main code change is in FifoIntraQueuePreemptionPlugin.calculateIdealAssignedResourcePerApp. Eric Payne and Wangda Tan, could you please help to take a quick look in this function where user-limit is decremented with queue.amUsed.

          Thank You.

          Show
          sunilg Sunil G added a comment - Thank you very Eric Payne for uncovering user-limit scenario. Highly appreciate for the same. Root cause of the issue is because user-limit was included with real AM used. We were internally avoiding this in all places. I have made the fix and tested in a real cluster. Main code change is in FifoIntraQueuePreemptionPlugin.calculateIdealAssignedResourcePerApp . Eric Payne and Wangda Tan , could you please help to take a quick look in this function where user-limit is decremented with queue.amUsed. Thank You.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 40s trunk passed
          +1 compile 0m 31s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 37s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 0m 55s trunk passed
          +1 javadoc 0m 22s trunk passed
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 28s the patch passed
          +1 javac 0m 28s the patch passed
          -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 24 new + 175 unchanged - 33 fixed = 199 total (was 208)
          +1 mvnsite 0m 35s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 1s the patch passed
          +1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938)
          -1 unit 40m 20s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          54m 45s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834742/YARN-2009.0014.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5f9b3c6dcd78 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ae8bccd
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13472/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/13472/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13472/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13472/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13472/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 40s trunk passed +1 compile 0m 31s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 37s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 55s trunk passed +1 javadoc 0m 22s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 28s the patch passed +1 javac 0m 28s the patch passed -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 24 new + 175 unchanged - 33 fixed = 199 total (was 208) +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 1s the patch passed +1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938) -1 unit 40m 20s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 54m 45s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestRMRestart Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834742/YARN-2009.0014.patch JIRA Issue YARN-2009 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5f9b3c6dcd78 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ae8bccd Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13472/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/13472/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/13472/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13472/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13472/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Test case failures are not related. YARN-5362 is handling the same.

          Show
          sunilg Sunil G added a comment - Test case failures are not related. YARN-5362 is handling the same.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks for update Sunil G,

          I have one doubt: should we deduct sum of all am-used for each user from user-limit? Behavior in the patch is deducting sum of all am-used across users in the queue.

          Show
          leftnoteasy Wangda Tan added a comment - Thanks for update Sunil G , I have one doubt: should we deduct sum of all am-used for each user from user-limit? Behavior in the patch is deducting sum of all am-used across users in the queue.
          Hide
          sunilg Sunil G added a comment -

          Thank you Wangda Tan. You are correct. we didnt have that metrics yet. I am now calculating from running apps, so for preemption it will be fine. Kindly help to check.

          Show
          sunilg Sunil G added a comment - Thank you Wangda Tan . You are correct. we didnt have that metrics yet. I am now calculating from running apps, so for preemption it will be fine. Kindly help to check.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 40s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 37s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 0m 58s trunk passed
          +1 javadoc 0m 21s trunk passed
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 30s the patch passed
          +1 javac 0m 30s the patch passed
          -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 26 new + 175 unchanged - 33 fixed = 201 total (was 208)
          +1 mvnsite 0m 34s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 3s the patch passed
          +1 javadoc 0m 19s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938)
          +1 unit 39m 19s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          53m 52s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835107/YARN-2009.0015.patch
          JIRA Issue YARN-2009
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux eeb2c9457c32 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / dbd2057
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13499/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13499/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13499/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 40s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 37s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 58s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 30s the patch passed +1 javac 0m 30s the patch passed -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 26 new + 175 unchanged - 33 fixed = 201 total (was 208) +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 3s the patch passed +1 javadoc 0m 19s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938) +1 unit 39m 19s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 53m 52s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835107/YARN-2009.0015.patch JIRA Issue YARN-2009 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux eeb2c9457c32 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / dbd2057 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13499/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13499/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13499/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          eepayne Eric Payne added a comment -

          Sunil G, I am still analyzing the code, but I do want to point out one concern that I have.

          TestProportionalCapacityPreemptionPolicyIntraQueue#testPreemptionWithTwoUsers: In this test, the user limit resource is 30, and app3 has 40 resources. So, at most, I would expect only 10 resources to be preempted from app3, which would bring app3 down to 30 resources to match the ULR. Instead, 25 resources were preempted. I feel that FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp should not allow tmpApp.idealAssigned to be lower than the ULR.

          One more thing I noticed during my manual testing is that although the unnecessary preemption is a lot less, there are cases where it still happens. I haven't drilled down yet.

          Show
          eepayne Eric Payne added a comment - Sunil G , I am still analyzing the code, but I do want to point out one concern that I have. TestProportionalCapacityPreemptionPolicyIntraQueue#testPreemptionWithTwoUsers : In this test, the user limit resource is 30, and app3 has 40 resources. So, at most, I would expect only 10 resources to be preempted from app3 , which would bring app3 down to 30 resources to match the ULR. Instead, 25 resources were preempted. I feel that FifoIntraQueuePreemptionPlugin#calculateIdealAssignedResourcePerApp should not allow tmpApp.idealAssigned to be lower than the ULR. One more thing I noticed during my manual testing is that although the unnecessary preemption is a lot less, there are cases where it still happens. I haven't drilled down yet.
          Hide
          sunilg Sunil G added a comment -

          Thanks Eric Payne for the update.

          I will also investigate with few more scenarios w.r.t user-limit and priority together. For the testcase, I think there is a stubbing issue in mockLeafQueue. I can make change for the same. Meanwhile I will wait for some more manual tests from my end, and will try to share a report here. Meantime pls let me know if you run into any issues.

          Show
          sunilg Sunil G added a comment - Thanks Eric Payne for the update. I will also investigate with few more scenarios w.r.t user-limit and priority together. For the testcase, I think there is a stubbing issue in mockLeafQueue. I can make change for the same. Meanwhile I will wait for some more manual tests from my end, and will try to share a report here. Meantime pls let me know if you run into any issues.
          Hide
          sunilg Sunil G added a comment -

          I understand the reason for this issue. Queue's free resource were not considered in the algorithm while calculating idealAssigned. So current code will work in cases where queue is full or overflowed. If queue is under-utilized, we may do over preemption.
          I ll share a patch short while.

          Show
          sunilg Sunil G added a comment - I understand the reason for this issue. Queue's free resource were not considered in the algorithm while calculating idealAssigned. So current code will work in cases where queue is full or overflowed. If queue is under-utilized, we may do over preemption. I ll share a patch short while.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Eric Payne, Sunil G.

          I'm not sure if we need to consider free resource for a queue.

          There're two scenarios we need to cover
          1) One queue's used < configured, and cluster is full
          2) One queue's used < configured, and cluster has enough free resource

          For 1), we should do something for the queue to make sure it can balance, if inter-queue preemption happens at the same time, less intra-queue preemption happens
          For 2), it is possible that we mark some resource to preemption candidate, but before these resource preempted, the queue will get more resource and previously added candidates will be cancelled.

          And also, the whole point of intra-queue preemption is to make sure queue can balance itself. We have configs for intra-queue preemption minimum-threshold/max-allowable-limit to make sure we will
          1) Preempt queue's resource when it is not beyond a threshold
          2) Do not preempt too much resource
          In addition, inter-queue preemption happens first before intra-queue preemption.

          With all the above safeguards, existing logics looks safe since we won't kill containers immediately after they get selected. I'm a little concern that considering the (configured - used) resource of queue could cause sometimes intra-queue preemption cannot be triggered.

          Show
          leftnoteasy Wangda Tan added a comment - Eric Payne , Sunil G . I'm not sure if we need to consider free resource for a queue. There're two scenarios we need to cover 1) One queue's used < configured, and cluster is full 2) One queue's used < configured, and cluster has enough free resource For 1), we should do something for the queue to make sure it can balance, if inter-queue preemption happens at the same time, less intra-queue preemption happens For 2), it is possible that we mark some resource to preemption candidate, but before these resource preempted, the queue will get more resource and previously added candidates will be cancelled. And also, the whole point of intra-queue preemption is to make sure queue can balance itself. We have configs for intra-queue preemption minimum-threshold/max-allowable-limit to make sure we will 1) Preempt queue's resource when it is not beyond a threshold 2) Do not preempt too much resource In addition, inter-queue preemption happens first before intra-queue preemption. With all the above safeguards, existing logics looks safe since we won't kill containers immediately after they get selected. I'm a little concern that considering the (configured - used) resource of queue could cause sometimes intra-queue preemption cannot be triggered.
          Hide
          leftnoteasy Wangda Tan added a comment -

          I suggest to commit this patch first and fix other possible problems separately.

          This patch is already very complex and dense, review modified logics is very painful. Besides user-limit related computations, I don't think it has very critical issues, many of potential computation mistakes should be able to be corrected automatically by several rounds of scheduling/preemption logics (Just like example in the above comment).

          Let's focus all the user-limit related issues in the next patch.

          Thoughts?

          Show
          leftnoteasy Wangda Tan added a comment - I suggest to commit this patch first and fix other possible problems separately. This patch is already very complex and dense, review modified logics is very painful. Besides user-limit related computations, I don't think it has very critical issues, many of potential computation mistakes should be able to be corrected automatically by several rounds of scheduling/preemption logics (Just like example in the above comment). Let's focus all the user-limit related issues in the next patch. Thoughts?
          Hide
          eepayne Eric Payne added a comment -

          there are cases where unnecessary preemption still happens.

          I have a multi-tenant use case where 1 or 2 intra-queue preemptions occur unnecessarily every cycle.

          many of potential computation mistakes should be able to be corrected automatically by several rounds

          This one will not. It happens every round.

          I can provide the details of my manual test if you would like. I haven't found a way yet to change the configs to make this stop happening. If you have suggestions, please let me know.

          Many of our queues are used by multiple users, and if a high priority app can cause apps from other users to be unnecessarily preempted every preemption cycle, that will not be acceptable by our users.

          Let's focus all the user-limit related issues in the next patch.

          The patch works well if users don't share queues. However, my concern is that if users share queues AND they want to use this feature to manage priorities between apps of the same user, this will cause problems with other users in the queue. If we put this in as it is, we are stating that this feature must not be turned on if any preemptable queue in the cluster is multi-tenant. Is that what we want?

          Show
          eepayne Eric Payne added a comment - there are cases where unnecessary preemption still happens. I have a multi-tenant use case where 1 or 2 intra-queue preemptions occur unnecessarily every cycle . many of potential computation mistakes should be able to be corrected automatically by several rounds This one will not. It happens every round. I can provide the details of my manual test if you would like. I haven't found a way yet to change the configs to make this stop happening. If you have suggestions, please let me know. Many of our queues are used by multiple users, and if a high priority app can cause apps from other users to be unnecessarily preempted every preemption cycle, that will not be acceptable by our users. Let's focus all the user-limit related issues in the next patch. The patch works well if users don't share queues. However, my concern is that if users share queues AND they want to use this feature to manage priorities between apps of the same user, this will cause problems with other users in the queue. If we put this in as it is, we are stating that this feature must not be turned on if any preemptable queue in the cluster is multi-tenant. Is that what we want?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Hi Eric Payne,

          Really appreciate your help on testing and give feedbacks to this feature from the beginning. We have exactly same requirement: user share queues with different app priority. I was not trying to declare this feature is completed – I don't think user should use it before we fix the user limit issues you found.

          My suggestion is: commit existing patch, at least it works for single user and (I think) it doesn't have fundamental issue. We can fix it in the next patch with highest priority. Does this sound OK to you? I just feel reviewing changes of this patch is too painful, every time I felt I'm reviewing a new patch because it's hard to get which place changed

          I think the next patch should be incremental and much easier to be reviewed.

          Please let me know your thoughts, thanks.

          Show
          leftnoteasy Wangda Tan added a comment - Hi Eric Payne , Really appreciate your help on testing and give feedbacks to this feature from the beginning. We have exactly same requirement: user share queues with different app priority. I was not trying to declare this feature is completed – I don't think user should use it before we fix the user limit issues you found. My suggestion is: commit existing patch, at least it works for single user and (I think) it doesn't have fundamental issue. We can fix it in the next patch with highest priority. Does this sound OK to you? I just feel reviewing changes of this patch is too painful, every time I felt I'm reviewing a new patch because it's hard to get which place changed I think the next patch should be incremental and much easier to be reviewed. Please let me know your thoughts, thanks.
          Hide
          sunilg Sunil G added a comment -

          Hi Eric Payne and Wangda Tan

          I am also inline with Wangda's thought. I kind of feel that fundamentally basic cases are running fine. And free resources are handled by surgical preemption feature. And as discussed earlier, we are coming with dead zone, natural-termination-factor etc in next patch. So it can help us to avoid some edge case over preemptions if any.

          Show
          sunilg Sunil G added a comment - Hi Eric Payne and Wangda Tan I am also inline with Wangda's thought. I kind of feel that fundamentally basic cases are running fine. And free resources are handled by surgical preemption feature. And as discussed earlier, we are coming with dead zone, natural-termination-factor etc in next patch. So it can help us to avoid some edge case over preemptions if any.
          Hide
          eepayne Eric Payne added a comment -

          My suggestion is: commit existing patch, at least it works for single user and (I think) it doesn't have fundamental issue. We can fix it in the next patch with highest priority. Does this sound OK

          Yes, that's fine. I assume our next step is to begin working on YARN-2113?

          Show
          eepayne Eric Payne added a comment - My suggestion is: commit existing patch, at least it works for single user and (I think) it doesn't have fundamental issue. We can fix it in the next patch with highest priority. Does this sound OK Yes, that's fine. I assume our next step is to begin working on YARN-2113 ?
          Hide
          sunilg Sunil G added a comment -

          Thanks Eric Payne.
          Yes. I am working on scheduler's user-limit improvement (jira will be raised separately) and in YARN-2113.

          Show
          sunilg Sunil G added a comment - Thanks Eric Payne . Yes. I am working on scheduler's user-limit improvement (jira will be raised separately) and in YARN-2113 .
          Hide
          curino Carlo Curino added a comment -

          I am starting reviewing this as part of the YARN-meetup bug bash. Sunil G please stay tuned

          Show
          curino Carlo Curino added a comment - I am starting reviewing this as part of the YARN-meetup bug bash. Sunil G please stay tuned
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks Carlo Curino, and Eric Payne, could you also share your thoughts about the latest patch?

          Show
          leftnoteasy Wangda Tan added a comment - Thanks Carlo Curino , and Eric Payne , could you also share your thoughts about the latest patch?
          Hide
          curino Carlo Curino added a comment -

          There is lots going on, so I am partially trusting Wangda Tan and Eric Payne long reviewing process.
          I think I can mostly get what you are trying to do, and the patch mostly looks good.

          Please fix the checkstyles, as well as consider the following:

          1. In IntraQueuePreemptionComputePlugin.getResourceDemandFromAppsPerQueue(LeafQueue leafQueue, String partition) ask to pass in a LeafQueue, but the only implementor FifoIntraQueuePreemptionPlugin seems only to care about the queue name. Can we simply pass in a String? I am concerned we bleed references to queues and apps a bit much.
          2. Readability: you call queueTotalUnassigned a quantity that is currently assigned, but you are considering "reassignable"... the naming is misleading I think. This happens in FifoIntraQueuePreemptionPlugin and its callers.
          3. more flow-level comments / explanations can help understand the class relationships and the intents of various methods/steps.
          4. comments and tests are not always consistent in TestProportionalCapacityPreemptionPolicyIntraQueue... e.g., the first split of guaranteed resources is different in comment and string used to config test. Also please add more comments (e.g., testPreemptionWithTwoUsers what is try to achieve?).
          5. question: in TestProportionalCapacityPreemptionPolicyIntraQueue.testSimpleIntraQueuePreemption() why is app4 seeing
          6. generally for the TestProportionalCapacityPreemptionPolicyIntraQueue it is nice to have spelled-out cases like you have, but it also makes sense for coverage to have a large that does many large randomly generated tests, and for which you check simple invariants (e.g., no preemption within priority, or priority invertions etc)... This might help unveil corner cases before we hit prod deployments.
          7. is maxIntraQueuePreemptableLimit set to 50% a reasonable default? it seems maybe high?
          Show
          curino Carlo Curino added a comment - There is lots going on, so I am partially trusting Wangda Tan and Eric Payne long reviewing process. I think I can mostly get what you are trying to do, and the patch mostly looks good. Please fix the checkstyles, as well as consider the following: In IntraQueuePreemptionComputePlugin.getResourceDemandFromAppsPerQueue(LeafQueue leafQueue, String partition) ask to pass in a LeafQueue , but the only implementor FifoIntraQueuePreemptionPlugin seems only to care about the queue name. Can we simply pass in a String? I am concerned we bleed references to queues and apps a bit much. Readability: you call queueTotalUnassigned a quantity that is currently assigned, but you are considering "reassignable"... the naming is misleading I think. This happens in FifoIntraQueuePreemptionPlugin and its callers. more flow-level comments / explanations can help understand the class relationships and the intents of various methods/steps. comments and tests are not always consistent in TestProportionalCapacityPreemptionPolicyIntraQueue ... e.g., the first split of guaranteed resources is different in comment and string used to config test. Also please add more comments (e.g., testPreemptionWithTwoUsers what is try to achieve?). question: in TestProportionalCapacityPreemptionPolicyIntraQueue.testSimpleIntraQueuePreemption() why is app4 seeing generally for the TestProportionalCapacityPreemptionPolicyIntraQueue it is nice to have spelled-out cases like you have, but it also makes sense for coverage to have a large that does many large randomly generated tests, and for which you check simple invariants (e.g., no preemption within priority, or priority invertions etc)... This might help unveil corner cases before we hit prod deployments. is maxIntraQueuePreemptableLimit set to 50% a reasonable default? it seems maybe high?
          Hide
          eepayne Eric Payne added a comment -

          The latest patch LGTM, as long as we commit it with the understanding that intra-queue preemption can only be enabled if no preemptable queue in the cluster are multi-tenant.

          Show
          eepayne Eric Payne added a comment - The latest patch LGTM, as long as we commit it with the understanding that intra-queue preemption can only be enabled if no preemptable queue in the cluster are multi-tenant.
          Hide
          curino Carlo Curino added a comment -

          Per our discussion it is ok to postpone (6) to a separate patch (maybe using SLS maybe as a framework for unit test).

          Show
          curino Carlo Curino added a comment - Per our discussion it is ok to postpone (6) to a separate patch (maybe using SLS maybe as a framework for unit test).
          Hide
          curino Carlo Curino added a comment -

          BTW other than the small issues above it is LGTM for me as well.

          Show
          curino Carlo Curino added a comment - BTW other than the small issues above it is LGTM for me as well.
          Hide
          sunilg Sunil G added a comment -

          Thank you very much Carlo Curino for the comments.

          question: in TestProportionalCapacityPreemptionPolicyIntraQueue.testSimpleIntraQueuePreemption() why is app4 seeing

          app3 and app4 are of same priority. However app3 is added first to queue, and then app4. Hence according to FIFO within same priority level, app4 is last submitted. Hence app4 will be selected to preempt first.

          is maxIntraQueuePreemptableLimit set to 50% a reasonable default? it seems maybe high?

          yes. Decreased to 20% now.

          Thank Eric Payne. Agreeing that this patch is majorly concentrating in solving single user's priority inversion problem

          Show
          sunilg Sunil G added a comment - Thank you very much Carlo Curino for the comments. question: in TestProportionalCapacityPreemptionPolicyIntraQueue.testSimpleIntraQueuePreemption() why is app4 seeing app3 and app4 are of same priority. However app3 is added first to queue, and then app4. Hence according to FIFO within same priority level, app4 is last submitted. Hence app4 will be selected to preempt first. is maxIntraQueuePreemptableLimit set to 50% a reasonable default? it seems maybe high? yes. Decreased to 20% now. Thank Eric Payne . Agreeing that this patch is majorly concentrating in solving single user's priority inversion problem
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 42s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 38s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 0m 58s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 30s the patch passed
          +1 javac 0m 30s the patch passed
          -0 checkstyle 0m 21s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 21 new + 173 unchanged - 35 fixed = 194 total (was 208)
          +1 mvnsite 0m 34s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          -1 whitespace 0m 0s The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 findbugs 1m 2s the patch passed
          +1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938)
          +1 unit 38m 45s hadoop-yarn-server-resourcemanager in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          53m 55s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue YARN-2009
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835857/YARN-2009.0016.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e037d2ec2ea2 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8a9388e
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13643/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/13643/artifact/patchprocess/whitespace-eol.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13643/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13643/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 42s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 58s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 30s the patch passed +1 javac 0m 30s the patch passed -0 checkstyle 0m 21s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 21 new + 173 unchanged - 35 fixed = 194 total (was 208) +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 13s the patch passed -1 whitespace 0m 0s The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 findbugs 1m 2s the patch passed +1 javadoc 0m 18s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager generated 0 new + 937 unchanged - 1 fixed = 937 total (was 938) +1 unit 38m 45s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 53m 55s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue YARN-2009 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835857/YARN-2009.0016.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e037d2ec2ea2 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8a9388e Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13643/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/13643/artifact/patchprocess/whitespace-eol.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13643/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13643/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks Sunil G,

          +1 to latest patch, will check this in at next Monday if no more opposite opinions.

          Show
          leftnoteasy Wangda Tan added a comment - Thanks Sunil G , +1 to latest patch, will check this in at next Monday if no more opposite opinions.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Committed to trunk/branch-2.

          Thanks Sunil G for working on this JIRA, thanks Carlo Curino for sharing so much valuable thoughts, and especially thanks Eric Payne for thorough review and tests!

          Show
          leftnoteasy Wangda Tan added a comment - Committed to trunk/branch-2. Thanks Sunil G for working on this JIRA, thanks Carlo Curino for sharing so much valuable thoughts, and especially thanks Eric Payne for thorough review and tests!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10737 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10737/)
          YARN-2009. CapacityScheduler: Add intra-queue preemption for app (wangda: rev 90dd3a8148468ac37a3f2173ad8d45e38bfcb0c9)

          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/PreemptionCandidatesSelector.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicyMockFramework.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoCandidatesSelector.java
          • (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/AbstractPreemptionEntity.java
          • (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TempAppPerPartition.java
          • (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/IntraQueuePreemptionComputePlugin.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          • (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/IntraQueueCandidatesSelector.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TempQueuePerPartition.java
          • (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyIntraQueue.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
          • (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/AbstractPreemptableResourceCalculator.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/PreemptableResourceCalculator.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
          • (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoIntraQueuePreemptionPlugin.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/CapacitySchedulerPreemptionContext.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/CapacitySchedulerPreemptionUtils.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10737 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10737/ ) YARN-2009 . CapacityScheduler: Add intra-queue preemption for app (wangda: rev 90dd3a8148468ac37a3f2173ad8d45e38bfcb0c9) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/PreemptionCandidatesSelector.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicyMockFramework.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoCandidatesSelector.java (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/AbstractPreemptionEntity.java (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TempAppPerPartition.java (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/IntraQueuePreemptionComputePlugin.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/IntraQueueCandidatesSelector.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TempQueuePerPartition.java (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyIntraQueue.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/AbstractPreemptableResourceCalculator.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/PreemptableResourceCalculator.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/FifoIntraQueuePreemptionPlugin.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/CapacitySchedulerPreemptionContext.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/CapacitySchedulerPreemptionUtils.java
          Hide
          sunilg Sunil G added a comment -

          Thank you very much Wangda Tan for thorough review/commit and suggestions. And special thanks to Eric Payne for the valuable thoughts and thorough tests/reviews, really appreciate the same. Also special thanks to Carlo Curino for sharing valuable inputs here.

          Show
          sunilg Sunil G added a comment - Thank you very much Wangda Tan for thorough review/commit and suggestions. And special thanks to Eric Payne for the valuable thoughts and thorough tests/reviews, really appreciate the same. Also special thanks to Carlo Curino for sharing valuable inputs here.
          Hide
          jianhe Jian He added a comment -

          One question, not sure if this is expected.
          This code synchronize on leafQueue object, while we already changed leafQueue to use read/write lock. should we change to use read lock ? similar thing for the FifoCandidatesSelector.

                  synchronized (leafQueue) {
                    Iterator<FiCaSchedulerApp> desc = leafQueue.getOrderingPolicy()
                        .getPreemptionIterator();
                    while (desc.hasNext()) {
                      FiCaSchedulerApp app = desc.next();
                      preemptFromLeastStarvedApp(selectedCandidates, clusterResource,
                          totalPreemptedResourceAllowed, resToObtainByPartition,
                          leafQueue, app);
                    }
                  }
          
          Show
          jianhe Jian He added a comment - One question, not sure if this is expected. This code synchronize on leafQueue object, while we already changed leafQueue to use read/write lock. should we change to use read lock ? similar thing for the FifoCandidatesSelector. synchronized (leafQueue) { Iterator<FiCaSchedulerApp> desc = leafQueue.getOrderingPolicy() .getPreemptionIterator(); while (desc.hasNext()) { FiCaSchedulerApp app = desc.next(); preemptFromLeastStarvedApp(selectedCandidates, clusterResource, totalPreemptedResourceAllowed, resToObtainByPartition, leafQueue, app); } }
          Hide
          sunilg Sunil G added a comment -

          Thank you Jian He
          We also have a similar code in the existing Inter-Queue preemption code as well FifoCandidatesSelector.selectCandidates. I think we can change both to a read-only lock. I think we can handle that in another ticket since its affecting existing preemption as well. Thoughts?

          Show
          sunilg Sunil G added a comment - Thank you Jian He We also have a similar code in the existing Inter-Queue preemption code as well FifoCandidatesSelector.selectCandidates . I think we can change both to a read-only lock. I think we can handle that in another ticket since its affecting existing preemption as well. Thoughts?
          Hide
          eepayne Eric Payne added a comment -

          Hi Wangda Tan and Sunil G. I would like this feature to be backported to branch-2.8. In order for this to happen, we need to backport the prerequisites, YARN-4108 and YARN-4822. I have branch-2.8 patches for both of them. Can I backport these?

          Would you like me to post the patches on the respective JIRAs or should I just backport them directly? There were a few differences, but nothing major.

          Show
          eepayne Eric Payne added a comment - Hi Wangda Tan and Sunil G . I would like this feature to be backported to branch-2.8. In order for this to happen, we need to backport the prerequisites, YARN-4108 and YARN-4822 . I have branch-2.8 patches for both of them. Can I backport these? Would you like me to post the patches on the respective JIRAs or should I just backport them directly? There were a few differences, but nothing major.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Great persistent effort here, kudos Sunil G, Eric Payne and Wangda Tan!

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Great persistent effort here, kudos Sunil G , Eric Payne and Wangda Tan !
          Hide
          leftnoteasy Wangda Tan added a comment -

          Eric Payne, I'm OK with backport all required patches including YARN-4108/YARN-4822 and even YARN-4390.

          However, there's an ongoing discussion on community about is it better to re-branch branch-2.8 from branch-2, with that, you don't need all these backports, will it be an option to you? +Jason Lowe.

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Eric Payne , I'm OK with backport all required patches including YARN-4108 / YARN-4822 and even YARN-4390 . However, there's an ongoing discussion on community about is it better to re-branch branch-2.8 from branch-2, with that, you don't need all these backports, will it be an option to you? + Jason Lowe . Thanks, Wangda
          Hide
          eepayne Eric Payne added a comment -

          Good point. There's no hurry, so we should wait for a few days to see if that discussion gets resolved.

          Show
          eepayne Eric Payne added a comment - Good point. There's no hurry, so we should wait for a few days to see if that discussion gets resolved.
          Hide
          eepayne Eric Payne added a comment -

          Hey Wangda Tan. The fate of branch-2.8 is still up in the air, but at this point it seems to be leaning towards leaving branch-2.8 as its original branch (and not re-branching from head of branch-2). I would like to go ahead and do the backport of YARN-4108/YARN-4822 and YARN-4390 to branch-2.8, regardless. Any objections?

          Show
          eepayne Eric Payne added a comment - Hey Wangda Tan . The fate of branch-2.8 is still up in the air, but at this point it seems to be leaning towards leaving branch-2.8 as its original branch (and not re-branching from head of branch-2). I would like to go ahead and do the backport of YARN-4108 / YARN-4822 and YARN-4390 to branch-2.8, regardless. Any objections?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Eric Payne, I'm OK with backporting all required preemption related changes to branch-2.8, please go ahead.

          Thanks

          Show
          leftnoteasy Wangda Tan added a comment - Eric Payne , I'm OK with backporting all required preemption related changes to branch-2.8, please go ahead. Thanks
          Hide
          eepayne Eric Payne added a comment -

          I'm backporting this to 2.8.

          Reopening so that I can put it back into patch available so it will kick a build.

          Show
          eepayne Eric Payne added a comment - I'm backporting this to 2.8. Reopening so that I can put it back into patch available so it will kick a build.
          Hide
          eepayne Eric Payne added a comment -

          Sunil G and Wangda Tan.

          I'm attaching the first cut at backporting priority-based intra-queue preemption to branch-2.8.

          I have tested this in the 2.8 branch, and it seems to work. However, please pay close attention to PreemptionCandidatesSelector.java and FiCaSchedulerApp.java. In trunk and branch-2, they both depend on SchedulerKey, which I think was introduced in YARN-5392. I don't think we want to backport YARN-5392.

          Show
          eepayne Eric Payne added a comment - Sunil G and Wangda Tan . I'm attaching the first cut at backporting priority-based intra-queue preemption to branch-2.8. I have tested this in the 2.8 branch, and it seems to work. However, please pay close attention to PreemptionCandidatesSelector.java and FiCaSchedulerApp.java . In trunk and branch-2, they both depend on SchedulerKey , which I think was introduced in YARN-5392 . I don't think we want to backport YARN-5392 .
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 24s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 8m 56s branch-2.8 passed
          +1 compile 0m 29s branch-2.8 passed with JDK v1.8.0_111
          +1 compile 0m 32s branch-2.8 passed with JDK v1.7.0_121
          +1 checkstyle 0m 21s branch-2.8 passed
          +1 mvnsite 0m 38s branch-2.8 passed
          +1 mvneclipse 0m 18s branch-2.8 passed
          +1 findbugs 1m 10s branch-2.8 passed
          +1 javadoc 0m 20s branch-2.8 passed with JDK v1.8.0_111
          +1 javadoc 0m 23s branch-2.8 passed with JDK v1.7.0_121
          +1 mvninstall 0m 30s the patch passed
          +1 compile 0m 24s the patch passed with JDK v1.8.0_111
          +1 javac 0m 24s the patch passed
          +1 compile 0m 29s the patch passed with JDK v1.7.0_121
          +1 javac 0m 29s the patch passed
          -0 checkstyle 0m 17s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 21 new + 155 unchanged - 33 fixed = 176 total (was 188)
          +1 mvnsite 0m 34s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          -1 whitespace 0m 0s The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 findbugs 1m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 0m 16s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_111 with JDK v1.8.0_111 generated 0 new + 974 unchanged - 1 fixed = 974 total (was 975)
          +1 javadoc 0m 21s the patch passed with JDK v1.7.0_121
          -1 unit 73m 40s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_121.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          165m 46s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
            Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.orderingPolicy; locked 86% of time Unsynchronized access at LeafQueue.java:86% of time Unsynchronized access at LeafQueue.java:[line 1820]
          JDK v1.8.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestClientRMTokens
          JDK v1.7.0_121 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestClientRMTokens



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:5af2af1
          JIRA Issue YARN-2009
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12842625/YARN-2009.branch-2.8.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux d13d5ccf9ccd 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.8 / e8b0ef0
          Default Java 1.7.0_121
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14247/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/14247/artifact/patchprocess/whitespace-eol.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/14247/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html
          unit https://builds.apache.org/job/PreCommit-YARN-Build/14247/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_121.txt
          JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14247/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14247/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 24s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 8m 56s branch-2.8 passed +1 compile 0m 29s branch-2.8 passed with JDK v1.8.0_111 +1 compile 0m 32s branch-2.8 passed with JDK v1.7.0_121 +1 checkstyle 0m 21s branch-2.8 passed +1 mvnsite 0m 38s branch-2.8 passed +1 mvneclipse 0m 18s branch-2.8 passed +1 findbugs 1m 10s branch-2.8 passed +1 javadoc 0m 20s branch-2.8 passed with JDK v1.8.0_111 +1 javadoc 0m 23s branch-2.8 passed with JDK v1.7.0_121 +1 mvninstall 0m 30s the patch passed +1 compile 0m 24s the patch passed with JDK v1.8.0_111 +1 javac 0m 24s the patch passed +1 compile 0m 29s the patch passed with JDK v1.7.0_121 +1 javac 0m 29s the patch passed -0 checkstyle 0m 17s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 21 new + 155 unchanged - 33 fixed = 176 total (was 188) +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 14s the patch passed -1 whitespace 0m 0s The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 findbugs 1m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 0m 16s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_111 with JDK v1.8.0_111 generated 0 new + 974 unchanged - 1 fixed = 974 total (was 975) +1 javadoc 0m 21s the patch passed with JDK v1.7.0_121 -1 unit 73m 40s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_121. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 165m 46s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager   Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.orderingPolicy; locked 86% of time Unsynchronized access at LeafQueue.java:86% of time Unsynchronized access at LeafQueue.java: [line 1820] JDK v1.8.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestClientRMTokens JDK v1.7.0_121 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestClientRMTokens Subsystem Report/Notes Docker Image:yetus/hadoop:5af2af1 JIRA Issue YARN-2009 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12842625/YARN-2009.branch-2.8.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d13d5ccf9ccd 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.8 / e8b0ef0 Default Java 1.7.0_121 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14247/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/14247/artifact/patchprocess/whitespace-eol.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/14247/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html unit https://builds.apache.org/job/PreCommit-YARN-Build/14247/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_121.txt JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14247/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14247/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          sunilg Sunil G added a comment -

          Yes we do not need to backport YARN-5392. I think back port patch looks good.
          I will also try apply patch and do some checks and will update my feedback.

          Show
          sunilg Sunil G added a comment - Yes we do not need to backport YARN-5392 . I think back port patch looks good. I will also try apply patch and do some checks and will update my feedback.
          Hide
          sunilg Sunil G added a comment -

          Changes looks good for me. +1

          Show
          sunilg Sunil G added a comment - Changes looks good for me. +1
          Hide
          eepayne Eric Payne added a comment -

          I noticed that findbugs is complaining about inconsistently synchronized orderingPolicy. So, I'm attaching a new patch that addresses that.

          Show
          eepayne Eric Payne added a comment - I noticed that findbugs is complaining about inconsistently synchronized orderingPolicy . So, I'm attaching a new patch that addresses that.
          Hide
          eepayne Eric Payne added a comment -

          Cancelling and re-submitting patch to see if it will kick the 2.8 build.

          Show
          eepayne Eric Payne added a comment - Cancelling and re-submitting patch to see if it will kick the 2.8 build.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 20s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 8m 43s branch-2.8 passed
          +1 compile 0m 29s branch-2.8 passed with JDK v1.8.0_111
          +1 compile 0m 31s branch-2.8 passed with JDK v1.7.0_121
          +1 checkstyle 0m 21s branch-2.8 passed
          +1 mvnsite 0m 38s branch-2.8 passed
          +1 mvneclipse 0m 18s branch-2.8 passed
          +1 findbugs 1m 9s branch-2.8 passed
          +1 javadoc 0m 21s branch-2.8 passed with JDK v1.8.0_111
          +1 javadoc 0m 23s branch-2.8 passed with JDK v1.7.0_121
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 24s the patch passed with JDK v1.8.0_111
          +1 javac 0m 24s the patch passed
          +1 compile 0m 29s the patch passed with JDK v1.7.0_121
          +1 javac 0m 29s the patch passed
          -0 checkstyle 0m 17s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 22 new + 153 unchanged - 34 fixed = 175 total (was 187)
          +1 mvnsite 0m 35s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          -1 whitespace 0m 0s The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          +1 findbugs 1m 19s the patch passed
          +1 javadoc 0m 17s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_111 with JDK v1.8.0_111 generated 0 new + 974 unchanged - 1 fixed = 974 total (was 975)
          +1 javadoc 0m 22s the patch passed with JDK v1.7.0_121
          -1 unit 77m 29s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_121.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          173m 45s



          Reason Tests
          JDK v1.8.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization
          JDK v1.7.0_121 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerLazyPreemption
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:5af2af1
          JIRA Issue YARN-2009
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12842878/YARN-2009.branch-2.8.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 2c998f8d859b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.8 / 01b50b3
          Default Java 1.7.0_121
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14273/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/14273/artifact/patchprocess/whitespace-eol.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/14273/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_121.txt
          JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14273/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/14273/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 8m 43s branch-2.8 passed +1 compile 0m 29s branch-2.8 passed with JDK v1.8.0_111 +1 compile 0m 31s branch-2.8 passed with JDK v1.7.0_121 +1 checkstyle 0m 21s branch-2.8 passed +1 mvnsite 0m 38s branch-2.8 passed +1 mvneclipse 0m 18s branch-2.8 passed +1 findbugs 1m 9s branch-2.8 passed +1 javadoc 0m 21s branch-2.8 passed with JDK v1.8.0_111 +1 javadoc 0m 23s branch-2.8 passed with JDK v1.7.0_121 +1 mvninstall 0m 31s the patch passed +1 compile 0m 24s the patch passed with JDK v1.8.0_111 +1 javac 0m 24s the patch passed +1 compile 0m 29s the patch passed with JDK v1.7.0_121 +1 javac 0m 29s the patch passed -0 checkstyle 0m 17s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 22 new + 153 unchanged - 34 fixed = 175 total (was 187) +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 14s the patch passed -1 whitespace 0m 0s The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply +1 findbugs 1m 19s the patch passed +1 javadoc 0m 17s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_111 with JDK v1.8.0_111 generated 0 new + 974 unchanged - 1 fixed = 974 total (was 975) +1 javadoc 0m 22s the patch passed with JDK v1.7.0_121 -1 unit 77m 29s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_121. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 173m 45s Reason Tests JDK v1.8.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_121 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerLazyPreemption   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:5af2af1 JIRA Issue YARN-2009 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12842878/YARN-2009.branch-2.8.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 2c998f8d859b 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.8 / 01b50b3 Default Java 1.7.0_121 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14273/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/14273/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14273/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_121.txt JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14273/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14273/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          eepayne Eric Payne added a comment -

          Committing to branch-2.8. Thanks Sunil G and Wangda Tan

          Show
          eepayne Eric Payne added a comment - Committing to branch-2.8. Thanks Sunil G and Wangda Tan

            People

            • Assignee:
              sunilg Sunil G
              Reporter:
              devaraj.k Devaraj K
            • Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development