Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1963

Support priorities across applications within the same queue

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: api, resourcemanager
    • Labels:
      None

      Description

      It will be very useful to support priorities among applications within the same queue, particularly in production scenarios. It allows for finer-grained controls without having to force admins to create a multitude of queues, plus allows existing applications to continue using existing queues which are usually part of institutional memory.

        Issue Links

        1.
        Support for Application priority : Changes in RM and Capacity Scheduler Sub-task Resolved Sunil G
         
        2.
        Priority scheduling support in Capacity scheduler Sub-task Resolved Sunil G
         
        3. App priority support in Fair Scheduler Sub-task Open Wei Yan
         
        4. Preemption in fair scheduler should consider app priorities Sub-task Open Wei Yan
         
        5.
        User level API support for priority label Sub-task Resolved Rohith Sharma K S
         
        6.
        Priority Label Manager in RM to manage application priority based on configuration Sub-task Resolved Sunil G
         
        7.
        Server side PB changes for Priority Label Manager and Admin CLI support Sub-task Resolved Sunil G
         
        8.
        Support admin cli interface in for Application Priority Sub-task Resolved Rohith Sharma K S
         
        9.
        FIFO scheduler doesn't respect changing job priority Sub-task Resolved Rohith Sharma K S
         
        10.
        Support for changing Application priority during runtime Sub-task Resolved Sunil G
         
        11.
        Display Application Priority in RM Web UI Sub-task Resolved Sunil G
         
        12.
        Support for application priority ACLs in queues of CapacityScheduler Sub-task Resolved Sunil G
         
        13.
        REST api support for Application Priority Sub-task Resolved Naganarasimha G R
         
        14.
        Support user cli interface in for Application Priority Sub-task Resolved Rohith Sharma K S
         
        15.
        Publish Application Priority to TimelineServer Sub-task Resolved Sunil G
         
        16.
        Render cluster Max Priority in scheduler metrics in RM web UI Sub-task Resolved Rohith Sharma K S
         
        17.
        Document ApplicationPriority feature Sub-task Resolved Rohith Sharma K S
         
        18.
        Runtime Application Priority change should not throw exception for applications at finishing states Sub-task Resolved Sunil G
         
        19.
        Retrospect update ApplicationPriority API return type Sub-task Resolved Rohith Sharma K S
         
        20.
        AM need to be notified with priority in AllocateResponse Sub-task Resolved Sunil G
         
        21.
        Retrospect app-priority in pendingOrderingPolicy during recovering applications Sub-task Resolved Rohith Sharma K S
         
        22.
        Pull out priority comparison from fifocomparator and use compound comparator for FifoOrdering policy Sub-task Resolved Rohith Sharma K S
         
        23.
        updateApplicationPriority api in scheduler should ensure to re-insert app to correct ordering policy Sub-task Resolved Bibin A Chundatt
         
        24.
        Retrospect updateApplicationPriority api to handle state store exception in align with YARN-5611 Sub-task Resolved Sunil G
         

          Activity

          Hide
          sunilg Sunil G added a comment -

          We have done few analysis and implemented support for application priority.
          I wish to share the thoughts here, kindly check the same.

          Design thoughts:
          1. Configuration Part
          We planned to use some existing priority configuration as given below. These are used to set a Job priority.
          a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority)
          b. We can also use configuration "mapreduce.job.priority".

          The values for priority can be VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW

          2. Scheduler Side
          If the Capacity Scheduler queue has multiple applications(Jobs) to run with different priorities, CS will allocate containers for the highest priority application and then for next priority and so on.
          When multiple queues are configured with different capacities, this priority will work internal to the each queue.

          For this, we planned to add a priority comparison check in the below data structure.
          Comparator<FiCaSchedulerApp> applicationComparator

          We added a priority check here in compare() of applicationComparator while selecting applications. Updated design here will be like,
          1. Check for priority first. If there, return highest priority job.
          2. Continue existing logic such as App ID comparison and TimeStamp comparison.

          With these changes, we can make highest priority job will get preference in a queue.

          NB: In addition to this, we added a preemption module also to get High priority jobs resources fast by preempting lower priority ones.

          I wish to upload a patch if this approach is fine.

          Show
          sunilg Sunil G added a comment - We have done few analysis and implemented support for application priority. I wish to share the thoughts here, kindly check the same. Design thoughts: 1. Configuration Part We planned to use some existing priority configuration as given below. These are used to set a Job priority. a. JobConf.getJobPriority() and Job.setPriority(JobPriority priority) b. We can also use configuration "mapreduce.job.priority". The values for priority can be VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW 2. Scheduler Side If the Capacity Scheduler queue has multiple applications(Jobs) to run with different priorities, CS will allocate containers for the highest priority application and then for next priority and so on. When multiple queues are configured with different capacities, this priority will work internal to the each queue. For this, we planned to add a priority comparison check in the below data structure. Comparator<FiCaSchedulerApp> applicationComparator We added a priority check here in compare() of applicationComparator while selecting applications. Updated design here will be like, 1. Check for priority first. If there, return highest priority job. 2. Continue existing logic such as App ID comparison and TimeStamp comparison. With these changes, we can make highest priority job will get preference in a queue. NB: In addition to this, we added a preemption module also to get High priority jobs resources fast by preempting lower priority ones. I wish to upload a patch if this approach is fine.
          Hide
          sandyr Sandy Ryza added a comment -

          Thanks for picking this up Sunil. Can we separate this into a couple JIRAs? One for the ResourceManager and protocol changes, one for the MapReduce changes, and one for the Capacity Scheduler changes.

          Show
          sandyr Sandy Ryza added a comment - Thanks for picking this up Sunil. Can we separate this into a couple JIRAs? One for the ResourceManager and protocol changes, one for the MapReduce changes, and one for the Capacity Scheduler changes.
          Hide
          sunilg Sunil G added a comment -

          Thank you Sandy for the review.
          As you have mentioned, I will create these subtasks and will handle seperately.

          Show
          sunilg Sunil G added a comment - Thank you Sandy for the review. As you have mentioned, I will create these subtasks and will handle seperately.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Sunil G, thanks for taking this up. This is a really useful feature!

          Before we jump into patches, we should consider writing up a small design doc that describes the requirements and the mechanism of implementation - not necessarily class-level design. There are few things to consider on the top of my head:

          • Values of priorities - static values like you described before or few known priorities backed by integers leaving gaps for more powerful interaction with priorities
          • ACLs on priorities - If we don't have some such mechanism, users will all be incentivized to submit apps all with the highest priority.
          • The classic priority inversion problem: MAPREDUCE-314

          I am sure there are more things to consider once we start thinking through this. I can help write this down, let me know what you think.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Sunil G , thanks for taking this up. This is a really useful feature! Before we jump into patches, we should consider writing up a small design doc that describes the requirements and the mechanism of implementation - not necessarily class-level design. There are few things to consider on the top of my head: Values of priorities - static values like you described before or few known priorities backed by integers leaving gaps for more powerful interaction with priorities ACLs on priorities - If we don't have some such mechanism, users will all be incentivized to submit apps all with the highest priority. The classic priority inversion problem: MAPREDUCE-314 I am sure there are more things to consider once we start thinking through this. I can help write this down, let me know what you think.
          Hide
          acmurthy Arun C Murthy added a comment -

          Sunil G thanks for taking this up!

          As Vinod Kumar Vavilapalli mentioned; a short writeup will help - look forward to helping get this in; thanks again!

          Show
          acmurthy Arun C Murthy added a comment - Sunil G thanks for taking this up! As Vinod Kumar Vavilapalli mentioned; a short writeup will help - look forward to helping get this in; thanks again!
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Added to Sunil thoughts, priority of jobs can also be displayed at RM web UI.!!

          Show
          rohithsharma Rohith Sharma K S added a comment - Added to Sunil thoughts, priority of jobs can also be displayed at RM web UI.!!
          Hide
          raviprak Ravi Prakash added a comment -

          I wonder if it'd be a good idea to percolate the priorities onto the actual containers as well? (I'm thinking (re)nice-ing container processes) ? That way we can submit more jobs than can all fit into memory and take advantage of OS scheduling to pick up the ones with the highest priority?

          Show
          raviprak Ravi Prakash added a comment - I wonder if it'd be a good idea to percolate the priorities onto the actual containers as well? (I'm thinking (re)nice-ing container processes) ? That way we can submit more jobs than can all fit into memory and take advantage of OS scheduling to pick up the ones with the highest priority?
          Hide
          maysamyabandeh Maysam Yabandeh added a comment -

          I was wondering what is the long-term plan for this jira? It does not seem to have any activity in the past 4 months and I was wondering if we have any rough estimate that on which release we plan to have this feature added?

          Show
          maysamyabandeh Maysam Yabandeh added a comment - I was wondering what is the long-term plan for this jira? It does not seem to have any activity in the past 4 months and I was wondering if we have any rough estimate that on which release we plan to have this feature added?
          Hide
          sunilg Sunil G added a comment -

          HI Maysam Yabandeh, We are bringing up a design doc for this by capturing all details, will soon publish the same. Vinod Kumar Vavilapalli, could we discuss doc this offline and publish it.

          Show
          sunilg Sunil G added a comment - HI Maysam Yabandeh , We are bringing up a design doc for this by capturing all details, will soon publish the same. Vinod Kumar Vavilapalli , could we discuss doc this offline and publish it.
          Hide
          sunilg Sunil G added a comment -

          Hi All

          I am uploading an initial draft for Application Priority design. Kindly review the same and share your thoughts. I am planning to bring up the subjiras by end of week and after a round of review.

          Thank you Vinod Kumar Vavilapalli for support.

          Show
          sunilg Sunil G added a comment - Hi All I am uploading an initial draft for Application Priority design. Kindly review the same and share your thoughts. I am planning to bring up the subjiras by end of week and after a round of review. Thank you Vinod Kumar Vavilapalli for support.
          Hide
          maysamyabandeh Maysam Yabandeh added a comment -

          Thanks Sunil G for the design doc.

          It might be useful if I share with you our use cases.

          Our most important use case is to let the admin change an app priority at runtime while it is running. The example is when a job gets unlucky taking much longer than usual due to some node failures or bugs. The user complains that the job is about to miss the deadline and admin needs a way to prioritize the user's job over the other jobs in the queue. This use case seems to be mentioned in Item 3 of Section 1.5.3 in the design doc but its "priority" seems not to be high.

          Another use case is to dynamically give a job higher priority based on the job status. For example, when mapper fails and there is no headroom in the queue, the job preempt its reducers to make space for its mappers. The freed space is however not necessarily offered back to the job in fair scheduling. Ideally job could increase its priority when its reducers are being stalled waiting for its mappers to be assigned.

          Once all these requests of higher priority applications are served, then lower priority application requests will get served from Resource Manager.

          We are using fair scheduler and I assumed this jira is to also cover that since YARN-2098 created as a sub-task. The design doc however seems to be fairly centered around CapacityScheduler. In the case of fair scheduler, I guess the priority can also be incorporated to the fair share calculation, instead of the strict order of high priority first.

          Show
          maysamyabandeh Maysam Yabandeh added a comment - Thanks Sunil G for the design doc. It might be useful if I share with you our use cases. Our most important use case is to let the admin change an app priority at runtime while it is running. The example is when a job gets unlucky taking much longer than usual due to some node failures or bugs. The user complains that the job is about to miss the deadline and admin needs a way to prioritize the user's job over the other jobs in the queue. This use case seems to be mentioned in Item 3 of Section 1.5.3 in the design doc but its "priority" seems not to be high. Another use case is to dynamically give a job higher priority based on the job status. For example, when mapper fails and there is no headroom in the queue, the job preempt its reducers to make space for its mappers. The freed space is however not necessarily offered back to the job in fair scheduling. Ideally job could increase its priority when its reducers are being stalled waiting for its mappers to be assigned. Once all these requests of higher priority applications are served, then lower priority application requests will get served from Resource Manager. We are using fair scheduler and I assumed this jira is to also cover that since YARN-2098 created as a sub-task. The design doc however seems to be fairly centered around CapacityScheduler. In the case of fair scheduler, I guess the priority can also be incorporated to the fair share calculation, instead of the strict order of high priority first.
          Hide
          sunilg Sunil G added a comment -

          Thank you Maysam Yabandeh for providing us the use cases.

          1.

          use case seems to be mentioned in Item 3 of Section 1.5.3

          Yes. By changing priority of an application at runtime, will help to over come the scenario mentioned by you. I will in-cooperate the same by providing more scenarios and impacts about it.
          2.

          priority can also be incorporated to the fair share calculation

          Application Priority will be supported by both schedulers. And there are sub jiras opened for same, however we can re allign the same w.r.t the same base design, and I will include changes from Fair also. As of now priority labels and internal implementation will be common, however separate ACL/per queue priority-label configurations will be required per scheduler level. In future, when both scheduler shares same config and common code, this can be pulled out as common code. For now, configurations and its specific implementation can be done separate for both schedulers. Sub jiras will be split ted accordingly

          Show
          sunilg Sunil G added a comment - Thank you Maysam Yabandeh for providing us the use cases. 1. use case seems to be mentioned in Item 3 of Section 1.5.3 Yes. By changing priority of an application at runtime, will help to over come the scenario mentioned by you. I will in-cooperate the same by providing more scenarios and impacts about it. 2. priority can also be incorporated to the fair share calculation Application Priority will be supported by both schedulers. And there are sub jiras opened for same, however we can re allign the same w.r.t the same base design, and I will include changes from Fair also. As of now priority labels and internal implementation will be common, however separate ACL/per queue priority-label configurations will be required per scheduler level. In future, when both scheduler shares same config and common code, this can be pulled out as common code. For now, configurations and its specific implementation can be done separate for both schedulers. Sub jiras will be split ted accordingly
          Hide
          sunilg Sunil G added a comment -

          Attached updated design doc capturing comments.

          Thank you.

          Show
          sunilg Sunil G added a comment - Attached updated design doc capturing comments. Thank you.
          Hide
          sunilg Sunil G added a comment -

          Updating document with added information about user/admin level api information.
          Also creating initial set of jiras for user/admin apis and priority-label manager to manage priority labels in RM.

          Show
          sunilg Sunil G added a comment - Updating document with added information about user/admin level api information. Also creating initial set of jiras for user/admin apis and priority-label manager to manage priority labels in RM.
          Hide
          eepayne Eric Payne added a comment -

          Thanks a lot Sunil G for taking the lead on this and putting together the design document. I have a question about per-priority ACLs.

          Can per-priority ACLs within a queue be inherited from queue-level ACLs if the per-priority ACLs aren't there? In a cluster that already has queues divided to be specific to business units, they will want to only specify the queue-level ACL list. In other words, in this use case, the queue-level users are already trusted enough to modify apps on that queue, regardless of priority, so they won't want the extra overhead of specifying additional priority-level ACLs. Is that part of the design?

          Show
          eepayne Eric Payne added a comment - Thanks a lot Sunil G for taking the lead on this and putting together the design document. I have a question about per-priority ACLs. Can per-priority ACLs within a queue be inherited from queue-level ACLs if the per-priority ACLs aren't there? In a cluster that already has queues divided to be specific to business units, they will want to only specify the queue-level ACL list. In other words, in this use case, the queue-level users are already trusted enough to modify apps on that queue, regardless of priority, so they won't want the extra overhead of specifying additional priority-level ACLs. Is that part of the design?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks Sunil G, Vinod Kumar Vavilapalli for your great effort on this!
          I've just read through the design doc, some comments:

          1) yarn.app.priority
          How this get to be implemented? Does this means, any YARN application doesn't need change a line of their code, can specify priority when submit the app using yarn CLI?
          I think if this can be done, we should extend to other YARN parameters like queue, node-label-expression, etc.

          2) Specify only highest priority for queue and user
          I found there are property like yarn.scheduler.root.<queue_name>.priority_label=high,low and yarn.scheduler.capacity.root.<queue_name>.<priority_label>.acl=user1,user2.
          I would perfer just specify only highest priority for queue and user. For example, it doesn't make sense to me if priority = {high,mid,low}, and a queue can access {high,low} only. Is there any benefit to specify individual priorities instead of highest priority?

          3) User limit and priority
          I think we shouldn't consider user limit within priority level, because the priority is not a specific kind of resource. Comparing to node label, you cannot say, user-X of queue-A used 8G highest priority resource, but you can say, user-X of queue-A used 8G resource in node with label=GPU. There's no difference for a 2G resource allocated to highest/lowest priority.

          If we want to implement this,

          it will not be fair to schedule resources in a uniform manner for all application in a queue with respect to user limits.

          I suggest to add preemption within queue considering priority. Upon YARN-2069, we can considering user-limit and priority together – while enforcing user-limit, we always preempt from lower priority applications.

          Any thoughts?

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Thanks Sunil G , Vinod Kumar Vavilapalli for your great effort on this! I've just read through the design doc, some comments: 1) yarn.app.priority How this get to be implemented? Does this means, any YARN application doesn't need change a line of their code, can specify priority when submit the app using yarn CLI? I think if this can be done, we should extend to other YARN parameters like queue, node-label-expression, etc. 2) Specify only highest priority for queue and user I found there are property like yarn.scheduler.root.<queue_name>.priority_label=high,low and yarn.scheduler.capacity.root.<queue_name>.<priority_label>.acl=user1,user2 . I would perfer just specify only highest priority for queue and user. For example, it doesn't make sense to me if priority = {high,mid,low}, and a queue can access {high,low} only. Is there any benefit to specify individual priorities instead of highest priority? 3) User limit and priority I think we shouldn't consider user limit within priority level, because the priority is not a specific kind of resource. Comparing to node label, you cannot say, user-X of queue-A used 8G highest priority resource, but you can say, user-X of queue-A used 8G resource in node with label=GPU. There's no difference for a 2G resource allocated to highest/lowest priority. If we want to implement this, it will not be fair to schedule resources in a uniform manner for all application in a queue with respect to user limits. I suggest to add preemption within queue considering priority. Upon YARN-2069 , we can considering user-limit and priority together – while enforcing user-limit, we always preempt from lower priority applications. Any thoughts? Thanks, Wangda
          Hide
          sunilg Sunil G added a comment -

          Hi Wangda,

          Thank for sharing your comments.

          Does this means, any YARN application doesn't need change a line of their code,

          yarn.app.priority can be passed from client side. And if client can set the priority value to ApplicationSubmissionContext which is received from this config, then RM can get the same. All we need a YarnClient implementation for taking this config and setting to ApplicationSubmissionContext. ( Something similar to queue name which this app is submitted to ).

          Specify only highest priority for queue and user

          The idea sounds good. The reason for specifying each label needed for a queue is because admin can specify the labels applicable for a queue. With high priority, we may always end up having default acceptance of lower priorities. How do you feel about having this as a range "low-high"

          cluster labels {very_high, high, medium, low}
          yarn.scheduler.root.<queue_name>.priority_label=low-high
          yarn.scheduler.capacity.root.<queue_name>.high.acl=user1,user2
          yarn.scheduler.capacity.root.<queue_name>.low.acl=user3,user4
          

          This was the intention. Please share your thoughts Vinod Kumar Vavilapalli Wangda Tan (No longer used)

          I think we shouldn't consider user limit within priority level

          I have a use case scenario here. There are few applications running in a queue from 4 different users (submitted to priority level low) and user-limit factor is 20. 5th user has ACL for submitting high priority applications. Because of user-limit, he can get only 20% maximum for his high priority apps. This high priority apps submitted by user5 may need more resource which intern will be rejected by user-limit check. How do you feel this use case?

          I suggest to add preemption within queue considering priority.

          +1. Already filed a subjira for this.

          Show
          sunilg Sunil G added a comment - Hi Wangda, Thank for sharing your comments. Does this means, any YARN application doesn't need change a line of their code, yarn.app.priority can be passed from client side. And if client can set the priority value to ApplicationSubmissionContext which is received from this config, then RM can get the same. All we need a YarnClient implementation for taking this config and setting to ApplicationSubmissionContext. ( Something similar to queue name which this app is submitted to ). Specify only highest priority for queue and user The idea sounds good. The reason for specifying each label needed for a queue is because admin can specify the labels applicable for a queue. With high priority, we may always end up having default acceptance of lower priorities. How do you feel about having this as a range "low-high" cluster labels {very_high, high, medium, low} yarn.scheduler.root.<queue_name>.priority_label=low-high yarn.scheduler.capacity.root.<queue_name>.high.acl=user1,user2 yarn.scheduler.capacity.root.<queue_name>.low.acl=user3,user4 This was the intention. Please share your thoughts Vinod Kumar Vavilapalli Wangda Tan (No longer used) I think we shouldn't consider user limit within priority level I have a use case scenario here. There are few applications running in a queue from 4 different users (submitted to priority level low) and user-limit factor is 20. 5th user has ACL for submitting high priority applications. Because of user-limit, he can get only 20% maximum for his high priority apps. This high priority apps submitted by user5 may need more resource which intern will be rejected by user-limit check. How do you feel this use case? I suggest to add preemption within queue considering priority. +1. Already filed a subjira for this.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Sunil G,
          Thanks for reply,

          All we need a YarnClient implementation for taking this config and setting to ApplicationSubmissionContext. ( Something similar to queue name which this app is submitted to ).

          Yes, that will be helpful, I wanna make sure they're not in YarnClient now? (including the queue). I didn't see any related code in YarnClient

          The idea sounds good. The reason for specifying each label needed for a queue is because admin can specify the labels applicable for a queue. With high priority, we may always end up having default acceptance of lower priorities. How do you feel about having this as a range "low-high"

          Instead of having low-high range, I'd prefer highest + default priority. Admin can specify highest priority for queue/user, and default priority for queue/user

          I have a use case scenario here. There are few applications running in a queue from 4 different users (sub...

          I understood the use case here, but I think maybe an easier way is not change the definition of user limit. Like having preemption mechanism to support higher priority applications take resource from lower priority applications, etc. Divide user limit by priority will add extra complexity both in implementation and configuration.

          I suggest to add preemption within queue considering priority. ... +1. Already filed a subjira for this.

          The preemption I mentioned here is not YARN-2009, is to support the previous use case you mentioned, we can keep user-limit as-is, and enforce higher priority application can get resource, that should be possible

          Thanks,
          Wangda

          Show
          leftnoteasy Wangda Tan added a comment - Sunil G , Thanks for reply, All we need a YarnClient implementation for taking this config and setting to ApplicationSubmissionContext. ( Something similar to queue name which this app is submitted to ). Yes, that will be helpful, I wanna make sure they're not in YarnClient now? (including the queue). I didn't see any related code in YarnClient The idea sounds good. The reason for specifying each label needed for a queue is because admin can specify the labels applicable for a queue. With high priority, we may always end up having default acceptance of lower priorities. How do you feel about having this as a range "low-high" Instead of having low-high range, I'd prefer highest + default priority. Admin can specify highest priority for queue/user, and default priority for queue/user I have a use case scenario here. There are few applications running in a queue from 4 different users (sub... I understood the use case here, but I think maybe an easier way is not change the definition of user limit. Like having preemption mechanism to support higher priority applications take resource from lower priority applications, etc. Divide user limit by priority will add extra complexity both in implementation and configuration. I suggest to add preemption within queue considering priority. ... +1. Already filed a subjira for this. The preemption I mentioned here is not YARN-2009 , is to support the previous use case you mentioned, we can keep user-limit as-is, and enforce higher priority application can get resource, that should be possible Thanks, Wangda
          Hide
          sunilg Sunil G added a comment -

          Thank you Wangda Tan

          I'd prefer highest + default priority.

          This configuration will make it easier for admins to config the same. Still I am not convinced with default acceptance coming from lower priorities by default. But I am not seeing any use case where this lower priorities are a problem also. Yes, we can have this as highest + default (this one i already have). Instead of labels per queue, it will be changed as highest per queue. I will update doc as per same, also my patch.

          extra complexity both in implementation and configuration

          I agree about the more complicated config and implementation for this part. As you mentioned, if a preemption feature related to YARN-2069 runs in parallel, then the issue which I pointed out can be solved. So user-limit factor preemption if considers priority also, we can get the head room which is needed. User has to enable this preemption though. If this is workaround way is fine for resolving the issue mentioned, then I will file a jira to relate priority with user-limit preemption. Kindly share your thoughts.

          I didn't see any related code in YarnClient

          Yes, this code is now in YarnRunner which is part of map reduce. I wanted to see it with YarnClient.

          Show
          sunilg Sunil G added a comment - Thank you Wangda Tan I'd prefer highest + default priority. This configuration will make it easier for admins to config the same. Still I am not convinced with default acceptance coming from lower priorities by default. But I am not seeing any use case where this lower priorities are a problem also. Yes, we can have this as highest + default (this one i already have). Instead of labels per queue, it will be changed as highest per queue. I will update doc as per same, also my patch. extra complexity both in implementation and configuration I agree about the more complicated config and implementation for this part. As you mentioned, if a preemption feature related to YARN-2069 runs in parallel, then the issue which I pointed out can be solved. So user-limit factor preemption if considers priority also, we can get the head room which is needed. User has to enable this preemption though. If this is workaround way is fine for resolving the issue mentioned, then I will file a jira to relate priority with user-limit preemption. Kindly share your thoughts. I didn't see any related code in YarnClient Yes, this code is now in YarnRunner which is part of map reduce. I wanted to see it with YarnClient.
          Hide
          sunilg Sunil G added a comment -

          Updated design doc as per the comments from Tan, Wangda

          Show
          sunilg Sunil G added a comment - Updated design doc as per the comments from Tan, Wangda
          Hide
          leftnoteasy Wangda Tan added a comment -

          Sunil G,
          I agree with your latest comment,
          Will get back to you once I read the new design doc.

          Thanks,

          Show
          leftnoteasy Wangda Tan added a comment - Sunil G , I agree with your latest comment, Will get back to you once I read the new design doc. Thanks,
          Hide
          eepayne Eric Payne added a comment -

          Hi Sunil G. Thanks for the work you are doing on this issue.

          yarn.scheduler.capacity.root.<queue_name>.<priority_label>.acl

          If this property doesn't exist, will queue admins still be able to change priorities of jobs in the queue?

          Show
          eepayne Eric Payne added a comment - Hi Sunil G . Thanks for the work you are doing on this issue. yarn.scheduler.capacity.root.<queue_name>.<priority_label>.acl If this property doesn't exist, will queue admins still be able to change priorities of jobs in the queue?
          Hide
          sunilg Sunil G added a comment -

          Thank you Wangda and Eric Payne for the comments.

          ACL, if configured for a queue, will be considered before submitting the job. If there are no configuration, only queue ACL will be checked which is same as what is happening now. priority label level ACL is on top of queue level ACL which is extra and can be configured as needed by admin.

          Show
          sunilg Sunil G added a comment - Thank you Wangda and Eric Payne for the comments. ACL, if configured for a queue, will be considered before submitting the job. If there are no configuration, only queue ACL will be checked which is same as what is happening now. priority label level ACL is on top of queue level ACL which is extra and can be configured as needed by admin.
          Hide
          sunilg Sunil G added a comment -

          As per discussion happened in YARN-2896 with Eric Payne and Wangda Tan, there is proposal to use Integer alone as priority from client and as well as in server. As per design doc, a priority label was used as wrapper for user and internally server was using corresponding integer with same. We can continue discussion on this here in parent JIRA. Looping Vinod Kumar Vavilapalli.

          Current idea:

          yarn.prority-labels = low:2, medium:4, high:6
          

          Proposed:

          yarn.application.priority = 2, 3 , 4
          

          Thank you for sharing your thoughts. I will now upload scheduler changes which can be reviewed meantime.

          Show
          sunilg Sunil G added a comment - As per discussion happened in YARN-2896 with Eric Payne and Wangda Tan , there is proposal to use Integer alone as priority from client and as well as in server. As per design doc, a priority label was used as wrapper for user and internally server was using corresponding integer with same. We can continue discussion on this here in parent JIRA. Looping Vinod Kumar Vavilapalli . Current idea: yarn.prority-labels = low:2, medium:4, high:6 Proposed: yarn.application.priority = 2, 3 , 4 Thank you for sharing your thoughts. I will now upload scheduler changes which can be reviewed meantime.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks for summary from Sunil G, I think priority should be a range instead of a set of numbers, may be we can refer to how linux do it, the range [-N, +N], and 0 is default priority.

          Show
          leftnoteasy Wangda Tan added a comment - Thanks for summary from Sunil G , I think priority should be a range instead of a set of numbers, may be we can refer to how linux do it, the range [-N, +N], and 0 is default priority.
          Hide
          eepayne Eric Payne added a comment -

          +1 on using numbers and not labels. It seems that the use of labels adds more complexity in mapping, sending via PB, and converting back to numbers, and does not seem to add much clarity.

          Show
          eepayne Eric Payne added a comment - +1 on using numbers and not labels. It seems that the use of labels adds more complexity in mapping, sending via PB, and converting back to numbers, and does not seem to add much clarity.
          Hide
          sunilg Sunil G added a comment -

          Yes Wangda Tan
          Idea of having a range will clearly define the priority subset each queue can adhere.
          I will share a patch on same by keeping same subjiras. Hoping to have a look by Vinod Kumar Vavilapalli on these latest changes.

          Show
          sunilg Sunil G added a comment - Yes Wangda Tan Idea of having a range will clearly define the priority subset each queue can adhere. I will share a patch on same by keeping same subjiras. Hoping to have a look by Vinod Kumar Vavilapalli on these latest changes.
          Hide
          sunilg Sunil G added a comment -

          Attaching an updated version.

          Show
          sunilg Sunil G added a comment - Attaching an updated version.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Pls discard this message.

          Show
          sunilg Sunil G added a comment - Pls discard this message.
          Hide
          sunilg Sunil G added a comment -

          Sorry. Due some error in webpage, last message got updated many times.
          Kindly ignore.

          Show
          sunilg Sunil G added a comment - Sorry. Due some error in webpage, last message got updated many times. Kindly ignore.
          Hide
          devaraj.k Devaraj K added a comment -

          I would also agree for numbers rather than labels for not to make it more complex. If we are moving with numbers, I think we can just use the existing priority API from ApplicationSubmissionContext.setPriority(Priority priority) and not required any new API's to expose to clients.

          We may need to think for M/R Job priory case, M/R Job supports enums for priority (i.e. VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW) and we need to have some mechanism to map these enums to priority numbers.

          Show
          devaraj.k Devaraj K added a comment - I would also agree for numbers rather than labels for not to make it more complex. If we are moving with numbers, I think we can just use the existing priority API from ApplicationSubmissionContext.setPriority(Priority priority) and not required any new API's to expose to clients. We may need to think for M/R Job priory case, M/R Job supports enums for priority (i.e. VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW) and we need to have some mechanism to map these enums to priority numbers.
          Hide
          sunilg Sunil G added a comment -

          Thank you Devaraj K for input

          I have updated the subjiras and uploaded patch by considering integer rather than label names.
          As mentioned, we can have the enums supported from MR side (can try using enums). But a translation table is needed for same and its better keep the same YarnClient side.

          Show
          sunilg Sunil G added a comment - Thank you Devaraj K for input I have updated the subjiras and uploaded patch by considering integer rather than label names. As mentioned, we can have the enums supported from MR side (can try using enums). But a translation table is needed for same and its better keep the same YarnClient side.
          Hide
          leftnoteasy Wangda Tan added a comment -

          One more question: I didn't see there's an API proposed to update app priority, I think it may be very useful when a job ran for some time, and need get completed as soon as we can.

          Is this a valid use case that we need to do within YARN-1963 scope?

          Show
          leftnoteasy Wangda Tan added a comment - One more question: I didn't see there's an API proposed to update app priority, I think it may be very useful when a job ran for some time, and need get completed as soon as we can. Is this a valid use case that we need to do within YARN-1963 scope?
          Hide
          jlowe Jason Lowe added a comment -

          I'd like to see changing app priorities addressed as it is a common ask from users. In many cases jobs are submitted to the cluster via some workflow/pipeline, and they would like to change the priority of apps already submitted. Otherwise they have to update their workflow/pipeline to change the submit-time priority, kill the active jobs, and resubmit the apps for the priority to take effect. Then eventually they need to change it all back to normal priorities later.

          Show
          jlowe Jason Lowe added a comment - I'd like to see changing app priorities addressed as it is a common ask from users. In many cases jobs are submitted to the cluster via some workflow/pipeline, and they would like to change the priority of apps already submitted. Otherwise they have to update their workflow/pipeline to change the submit-time priority, kill the active jobs, and resubmit the apps for the priority to take effect. Then eventually they need to change it all back to normal priorities later.
          Hide
          sunilg Sunil G added a comment -

          Thank you Wangda and Jason for the input

          Yes, it's good to change the priority of an application at runtime. I had mentioned it in the design doc.
          I have created a user api jira already, and it's client part can be handled there.

          Show
          sunilg Sunil G added a comment - Thank you Wangda and Jason for the input Yes, it's good to change the priority of an application at runtime. I had mentioned it in the design doc. I have created a user api jira already, and it's client part can be handled there.
          Hide
          leftnoteasy Wangda Tan added a comment -

          That's great! Thanks.

          Show
          leftnoteasy Wangda Tan added a comment - That's great! Thanks.
          Hide
          leftnoteasy Wangda Tan added a comment -

          That's great! Thanks.

          Show
          leftnoteasy Wangda Tan added a comment - That's great! Thanks.
          Hide
          leftnoteasy Wangda Tan added a comment -

          That's great! Thanks.

          Show
          leftnoteasy Wangda Tan added a comment - That's great! Thanks.
          Hide
          sunilg Sunil G added a comment -

          Uploading a prototype version based on configuration file.

          Show
          sunilg Sunil G added a comment - Uploading a prototype version based on configuration file.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          As per discussion happened in YARN-2896 with Eric Payne and Wangda Tan, there is proposal to use Integer alone as priority from client and as well as in server. As per design doc, a priority label was used as wrapper for user and internally server was using corresponding integer with same. We can continue discussion on this here in parent JIRA. Looping Vinod Kumar Vavilapalli.
          Current idea:
          yarn.prority-labels = low:2, medium:4, high:6
          Proposed:
          yarn.application.priority = 2, 3 , 4

          Without some sort of labels, it will be very hard for users to reason about the definition and relative importance of priorities across queues and cluster. We must support the notion of priority-labels to make this feature usable in practice.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - As per discussion happened in YARN-2896 with Eric Payne and Wangda Tan, there is proposal to use Integer alone as priority from client and as well as in server. As per design doc, a priority label was used as wrapper for user and internally server was using corresponding integer with same. We can continue discussion on this here in parent JIRA. Looping Vinod Kumar Vavilapalli. Current idea: yarn.prority-labels = low:2, medium:4, high:6 Proposed: yarn.application.priority = 2, 3 , 4 Without some sort of labels, it will be very hard for users to reason about the definition and relative importance of priorities across queues and cluster. We must support the notion of priority-labels to make this feature usable in practice.
          Hide
          nroberts Nathan Roberts added a comment -

          Without some sort of labels, it will be very hard for users to reason about the definition and relative importance of priorities across queues and cluster. We must support the notion of priority-labels to make this feature usable in practice.

          Maybe I'm missing something... Isn't it relatively easy to reason about 2<4 and therefore 2 is lower priority than 4? Unix/Linux hasn't had labels for priorities and it seems to be working pretty well there. Even if I have labels, I have to make sure that all queues and clusters define them precisely the same way or I wind up just as confused, if not even more. Just my $0.02

          Show
          nroberts Nathan Roberts added a comment - Without some sort of labels, it will be very hard for users to reason about the definition and relative importance of priorities across queues and cluster. We must support the notion of priority-labels to make this feature usable in practice. Maybe I'm missing something... Isn't it relatively easy to reason about 2<4 and therefore 2 is lower priority than 4? Unix/Linux hasn't had labels for priorities and it seems to be working pretty well there. Even if I have labels, I have to make sure that all queues and clusters define them precisely the same way or I wind up just as confused, if not even more. Just my $0.02
          Hide
          sunilg Sunil G added a comment -

          Thank you Vinod Kumar Vavilapalli and Nathan Roberts for the comments.

          Considering usability ways, labels will be handy. And scheduler must be agnostic of labels and should handle only integers like in linux. This will have a complexity on priority manager inside RM which will translate label -> integer an vice versa. But a call can be taken by seeing all possibilities and can be standardized the same so that a minimal working version can be pushed in by improvising on the patches submitted (working prototype was attached). Hoping Wangda Tan and Eric Payne to join the discussion.

          Show
          sunilg Sunil G added a comment - Thank you Vinod Kumar Vavilapalli and Nathan Roberts for the comments. Considering usability ways, labels will be handy. And scheduler must be agnostic of labels and should handle only integers like in linux. This will have a complexity on priority manager inside RM which will translate label -> integer an vice versa. But a call can be taken by seeing all possibilities and can be standardized the same so that a minimal working version can be pushed in by improvising on the patches submitted (working prototype was attached). Hoping Wangda Tan and Eric Payne to join the discussion.
          Hide
          eepayne Eric Payne added a comment -

          Thanks, Sunil G, for your work on in-queue priorities.

          Along with Nathan Roberts, I'm confused about why priority labels are needed. As a user, I just need to know that the higher the number, the higher the priority. Then, I just need a way to see what priority each application is using and a way to set the priority of applications. To me, it just seems like labels will get in the way.

          Show
          eepayne Eric Payne added a comment - Thanks, Sunil G , for your work on in-queue priorities. Along with Nathan Roberts , I'm confused about why priority labels are needed. As a user, I just need to know that the higher the number, the higher the priority. Then, I just need a way to see what priority each application is using and a way to set the priority of applications. To me, it just seems like labels will get in the way.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Assuming integers are supported

          • Do we have a range? Otherwise, nothing stops users from setting their priority be INTEGER_MAX and everybody scratching their heads.
          • If we have a range, which side is up? is -20 >20 like unix (isn't intuitive at all to me) or -20 < 20 (intuitive)?
          • Either ways, it is an implicit decision that needs to be documented and told to users explicitly. Labels convey that without any of that.
          • What does a negative priority means anything anyways?
          • Admin comes and says "I need a new super-high priority", now your ranges need to be dynamically size-able.

          I don't see a difference between say 10 priorities and 10 labeled priorities, other than that labels are better in the following

          • They are more human readable on the UI and CLIs: "This app has priority 19" doesn't give much feedback as much as "This app has HIGH priority"
          • Even if we don't want them now, you can let admins create new priorities between two existing ones, create a new priority lower than the lowest easily etc. With integers, you start with 0-10, then adding one more lower than them all takes them into negative priorities' territory making it all confusing.
          • Specifying restrictions is very straight forward: for a root.enginnering queue, VERY_HIGH can be only be used by (u1,u2, g1), HIGH by (u3, u4) and everything else by everyone.

          The way I see it, we will provide a predefined set of labeled priorities that should work for 80% of the clusters, the remaining can define their own set.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Assuming integers are supported Do we have a range? Otherwise, nothing stops users from setting their priority be INTEGER_MAX and everybody scratching their heads. If we have a range, which side is up? is -20 >20 like unix (isn't intuitive at all to me) or -20 < 20 (intuitive)? Either ways, it is an implicit decision that needs to be documented and told to users explicitly. Labels convey that without any of that. What does a negative priority means anything anyways? Admin comes and says "I need a new super-high priority", now your ranges need to be dynamically size-able. I don't see a difference between say 10 priorities and 10 labeled priorities, other than that labels are better in the following They are more human readable on the UI and CLIs: "This app has priority 19" doesn't give much feedback as much as "This app has HIGH priority" Even if we don't want them now, you can let admins create new priorities between two existing ones, create a new priority lower than the lowest easily etc. With integers, you start with 0-10, then adding one more lower than them all takes them into negative priorities' territory making it all confusing. Specifying restrictions is very straight forward: for a root.enginnering queue, VERY_HIGH can be only be used by (u1,u2, g1), HIGH by (u3, u4) and everything else by everyone. The way I see it, we will provide a predefined set of labeled priorities that should work for 80% of the clusters, the remaining can define their own set.
          Hide
          jlowe Jason Lowe added a comment -

          Do we have a range? Otherwise, nothing stops users from setting their priority be INTEGER_MAX and everybody scratching their heads.

          With any range, users could just set to the range max. The problem of having all the users set their priority to the highest is an orthogonal problem to how priorities are represented. As similarly suggested for labels, I think there should be a portion of the numerical range for highest priority reserved for admins.

          If we have a range, which side is up? is -20 >20 like unix (isn't intuitive at all to me) or -20 < 20 (intuitive)?

          I'd prefer higher priority values lead to higher priority in scheduling, but I don't care too much either way.

          What does a negative priority means anything anyways?

          A priority in isolation is meaningless since it only derives semantic meaning when compared to another priority. This applies to labels as well. A job running at VERY_HIGH priority is not what you originally thought if everything else in the cluster is running at VERY_VERY_HIGH priority. So negative priorities are just a part of the numerical range, and it's straightforward to compare negative numbers as well as positive numbers and know their ordering. If it's still thought to be too confusing we can always limit priorities to >=0 without losing much.

          Admin comes and says "I need a new super-high priority", now your ranges need to be dynamically size-able.

          Like UNIX, it would be easy to add limitations to the high priority range so users can't just arbitrarily set their priorities to the highest level. Then admins will always have the ability to make something higher priority than what users were able to set.

          I don't mind if we want to have label mappings to numerical priorities, but my biggest concern with labels is putting them in the API itself. For example, take an app framework that runs multiple utility jobs simultaneously and needs to set the relative priorities between them. If label names themselves are in the API then that app framework doesn't work on some clusters that don't have the expected labels configured. Or another concern: if one has to use labels to specify priority, what if they really need 40 different priorities? Can they come up with that many descriptive label names that users can reason their relative ordering just based on the name? Most likely they will end up using label names like PRIORITY_1, PRIORITY_2, ... PRIORITY_50, and the pain of having to configure all of that.

          Show
          jlowe Jason Lowe added a comment - Do we have a range? Otherwise, nothing stops users from setting their priority be INTEGER_MAX and everybody scratching their heads. With any range, users could just set to the range max. The problem of having all the users set their priority to the highest is an orthogonal problem to how priorities are represented. As similarly suggested for labels, I think there should be a portion of the numerical range for highest priority reserved for admins. If we have a range, which side is up? is -20 >20 like unix (isn't intuitive at all to me) or -20 < 20 (intuitive)? I'd prefer higher priority values lead to higher priority in scheduling, but I don't care too much either way. What does a negative priority means anything anyways? A priority in isolation is meaningless since it only derives semantic meaning when compared to another priority. This applies to labels as well. A job running at VERY_HIGH priority is not what you originally thought if everything else in the cluster is running at VERY_VERY_HIGH priority. So negative priorities are just a part of the numerical range, and it's straightforward to compare negative numbers as well as positive numbers and know their ordering. If it's still thought to be too confusing we can always limit priorities to >=0 without losing much. Admin comes and says "I need a new super-high priority", now your ranges need to be dynamically size-able. Like UNIX, it would be easy to add limitations to the high priority range so users can't just arbitrarily set their priorities to the highest level. Then admins will always have the ability to make something higher priority than what users were able to set. I don't mind if we want to have label mappings to numerical priorities, but my biggest concern with labels is putting them in the API itself. For example, take an app framework that runs multiple utility jobs simultaneously and needs to set the relative priorities between them. If label names themselves are in the API then that app framework doesn't work on some clusters that don't have the expected labels configured. Or another concern: if one has to use labels to specify priority, what if they really need 40 different priorities? Can they come up with that many descriptive label names that users can reason their relative ordering just based on the name? Most likely they will end up using label names like PRIORITY_1, PRIORITY_2, ... PRIORITY_50, and the pain of having to configure all of that.
          Hide
          leftnoteasy Wangda Tan added a comment -

          I think label-based and integer-based priorities are just two different ways to configure as well as API. No matter we choose to use label-based or integer-based priority, we should use integer only to implement internal logic (like in CapacityScheduler).

          In addition, we can make "label-based" priority to be just an alias of "integer-based" priority. For example, if we define queue's usable label to be [0-5], we need to add one "label" alias for all priorities if we want to support label-based priority (such as VERY_LOW, LOW, NORMAL, HIGH, VERY_HIGH). When alias is setting, YARN can accept label for priority and can show them on web UI.

          With this, configuration/API and our internal logic could be consistent.

          For example, a label-based configuration:

          <priority-of-queue-root.engineering>
             <alias> 
                 0:VERY_LOW, 1:LOW, 2:NORMAL, 3:HIGH, 4:VERY_HIGH
             </alias>
             
             <queue-defaults>
             	 <default>NORMAL</default>
             	 <max>HIGH</max>
             <queue-defaults>
          
             <user-setting users="u1, u2, g1">
               <default>HIGH</default>
               <max>VERY_HIGH</max>
             </user-setting>
          
             <user-setting users="u3, u4">
                ...
             </user-setting
          </prority>
          

          Which means:

          • Each integer has an alias
          • For users from "u1", "u2", "g1", their default application priority is HIGH, and they can manually set its priority up to VERY_HIGH
          • For other users, their default priority is NORMAL, and they can set its priority up to HIGH

          This configuration is equal to number-based config.

          <priority-of-queue-root.engineering>
             <queue-defaults>
             	 <default>2</default>
             	 <max>3</max>
             <queue-defaults>
          
             <user-setting users="u1, u2, g1">
               <default>3</default>
               <max>4</max>
             </user-setting>
          
             <user-setting users="u3, u4">
                ...
             </user-setting
          </prority>
          

          ...For example, take an app framework that runs multiple utility jobs simultaneously and needs to set the relative priorities between them. If label names themselves are in the API then that app framework doesn't work on some clusters that don't have the expected labels configured...

          I think this may not be a very big issue, no matter application uses label or integer, additional configuration should be made. We should only edit configuration instead of making changes of code.

          Show
          leftnoteasy Wangda Tan added a comment - I think label-based and integer-based priorities are just two different ways to configure as well as API. No matter we choose to use label-based or integer-based priority, we should use integer only to implement internal logic (like in CapacityScheduler). In addition, we can make "label-based" priority to be just an alias of "integer-based" priority. For example, if we define queue's usable label to be [0-5], we need to add one "label" alias for all priorities if we want to support label-based priority (such as VERY_LOW, LOW, NORMAL, HIGH, VERY_HIGH). When alias is setting, YARN can accept label for priority and can show them on web UI. With this, configuration/API and our internal logic could be consistent. For example, a label-based configuration: <priority-of-queue-root.engineering> <alias> 0:VERY_LOW, 1:LOW, 2:NORMAL, 3:HIGH, 4:VERY_HIGH </alias> <queue-defaults> < default >NORMAL</ default > <max>HIGH</max> <queue-defaults> <user-setting users= "u1, u2, g1" > < default >HIGH</ default > <max>VERY_HIGH</max> </user-setting> <user-setting users= "u3, u4" > ... </user-setting </prority> Which means: Each integer has an alias For users from "u1", "u2", "g1", their default application priority is HIGH, and they can manually set its priority up to VERY_HIGH For other users, their default priority is NORMAL, and they can set its priority up to HIGH This configuration is equal to number-based config. <priority-of-queue-root.engineering> <queue-defaults> < default >2</ default > <max>3</max> <queue-defaults> <user-setting users= "u1, u2, g1" > < default >3</ default > <max>4</max> </user-setting> <user-setting users= "u3, u4" > ... </user-setting </prority> ...For example, take an app framework that runs multiple utility jobs simultaneously and needs to set the relative priorities between them. If label names themselves are in the API then that app framework doesn't work on some clusters that don't have the expected labels configured... I think this may not be a very big issue, no matter application uses label or integer, additional configuration should be made. We should only edit configuration instead of making changes of code.
          Hide
          eepayne Eric Payne added a comment -

          I think label-based and integer-based priorities are just two different ways to configure as well as API. No matter we choose to use label-based or integer-based priority, we should use integer only to implement internal logic (like in CapacityScheduler).

          I think that is true especially when passing priorities through proto buffers, using integers is best.

          Show
          eepayne Eric Payne added a comment - I think label-based and integer-based priorities are just two different ways to configure as well as API. No matter we choose to use label-based or integer-based priority, we should use integer only to implement internal logic (like in CapacityScheduler). I think that is true especially when passing priorities through proto buffers, using integers is best.
          Hide
          sunilg Sunil G added a comment -

          HI Vinod Kumar Vavilapalli , Wangda Tan , Eric Payne Jason Lowe
          Using a priority as integer itself to scheduler's will be the first target, A manager which can act as a single point of contact which can translate from label to integer and vice versa. Yes, this will be an added complexity in RM, but if it can be taken out of Scheduler it reduces much of manipulation logic.

             <alias> 
                 0:VERY_LOW, 1:LOW, 2:NORMAL, 3:HIGH, 4:VERY_HIGH
             </alias>
          

          I feel we can make such label config in a common place which can be accessible for any schedulers.

          Show
          sunilg Sunil G added a comment - HI Vinod Kumar Vavilapalli , Wangda Tan , Eric Payne Jason Lowe Using a priority as integer itself to scheduler's will be the first target, A manager which can act as a single point of contact which can translate from label to integer and vice versa. Yes, this will be an added complexity in RM, but if it can be taken out of Scheduler it reduces much of manipulation logic. <alias> 0:VERY_LOW, 1:LOW, 2:NORMAL, 3:HIGH, 4:VERY_HIGH </alias> I feel we can make such label config in a common place which can be accessible for any schedulers.
          Hide
          leftnoteasy Wangda Tan added a comment -

          I feel we can make such label config in a common place which can be accessible for any schedulers.

          Agree, this should be a part of YARN configuration. I put it as a part of queue config just for readability for the proposal .

          Show
          leftnoteasy Wangda Tan added a comment - I feel we can make such label config in a common place which can be accessible for any schedulers. Agree, this should be a part of YARN configuration. I put it as a part of queue config just for readability for the proposal .
          Hide
          sunilg Sunil G added a comment -

          If we introduce a label to represent a specific application priority for configuration and usage purpose, we will get benefits of readability and ease of handling. Also IMO, we may not have a big set of priorities unto 100s o more. Eventhough if we are having more priorities, range of labels will also can help.

             <alias> 
                 0:VERY_LOW, 1:LOW, 2:NORMAL, 3:HIGH, 4:VERY_HIGH
             </alias>
          

          Vinod Kumar Vavilapalli , Jason Lowe, Wangda Tan, Eric Payne Please share your thoughts.

          Show
          sunilg Sunil G added a comment - If we introduce a label to represent a specific application priority for configuration and usage purpose, we will get benefits of readability and ease of handling. Also IMO, we may not have a big set of priorities unto 100s o more. Eventhough if we are having more priorities, range of labels will also can help. <alias> 0:VERY_LOW, 1:LOW, 2:NORMAL, 3:HIGH, 4:VERY_HIGH </alias> Vinod Kumar Vavilapalli , Jason Lowe , Wangda Tan , Eric Payne Please share your thoughts.
          Hide
          jlowe Jason Lowe added a comment -

          Part of the concern with the above proposal is that we're mapping labels to numbers, and that range is packed tight. If someone needs a priority like MEDIUM_HIGH then we have to dynamically remap the label to number mapping and update all existing priorities for applications (since we mapped them to numbers for performance in the scheduling algorithms).

          For me it comes back to my previous comment: priorities are meaningless unless compared to other priorities. It's easy to reason about numerical comparisons, many other systems already do this, and IMHO we can keep things simple initially. If we need to add label aliases we can always extend the API to do this later.

          If everyone else feels like we have to have labels out of the gate then I won't block it, but I'd like to see the basic functionality working before we complicate it.

          Show
          jlowe Jason Lowe added a comment - Part of the concern with the above proposal is that we're mapping labels to numbers, and that range is packed tight. If someone needs a priority like MEDIUM_HIGH then we have to dynamically remap the label to number mapping and update all existing priorities for applications (since we mapped them to numbers for performance in the scheduling algorithms). For me it comes back to my previous comment: priorities are meaningless unless compared to other priorities. It's easy to reason about numerical comparisons, many other systems already do this, and IMHO we can keep things simple initially. If we need to add label aliases we can always extend the API to do this later. If everyone else feels like we have to have labels out of the gate then I won't block it, but I'd like to see the basic functionality working before we complicate it.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Agree that we need to make basic functionality works, I suggest to keep the simple tight-packed alias for now, we should try to get both in, but label-based which shouldn't block int-based priority development.

          Show
          leftnoteasy Wangda Tan added a comment - Agree that we need to make basic functionality works, I suggest to keep the simple tight-packed alias for now, we should try to get both in, but label-based which shouldn't block int-based priority development.
          Hide
          sunilg Sunil G added a comment -

          Yes. We could try support both Integer and Label (with mappings). We may open independent Jiras to handle this case (have both patches, will sync up as one) , and which should achieve the same goal w/o complexity. And we will look for simpler version for now, not complex re-mappings etc.

          Show
          sunilg Sunil G added a comment - Yes. We could try support both Integer and Label (with mappings). We may open independent Jiras to handle this case (have both patches, will sync up as one) , and which should achieve the same goal w/o complexity. And we will look for simpler version for now, not complex re-mappings etc.
          Hide
          grey Lei Guo added a comment -

          Agree with Jason Lowe, integer is the base of priority, and label should be just an alias during application submission. If we keep both label and integer in the system, it could be complicate when administrator changing the label/range mapping.

          It's true that we do not expect the user to assign many different priorities, but we may enhance scheduler to calculate priority dynamically based on certain criteria, for example, the pending time or at certain time frame. In this case, the priority could be any number.

          Show
          grey Lei Guo added a comment - Agree with Jason Lowe , integer is the base of priority, and label should be just an alias during application submission. If we keep both label and integer in the system, it could be complicate when administrator changing the label/range mapping. It's true that we do not expect the user to assign many different priorities, but we may enhance scheduler to calculate priority dynamically based on certain criteria, for example, the pending time or at certain time frame. In this case, the priority could be any number.
          Hide
          sunilg Sunil G added a comment -

          Thank you Lei Guo for sharing the thoughts.
          As per the design, integer will be used in schedulers all alone. Hence all comparisons and operations can be done on integer. However we can have a label mapping for the integer which can be used while application submission, and to view in UI etc. Labels can be added as only a mappings to integer.

          Show
          sunilg Sunil G added a comment - Thank you Lei Guo for sharing the thoughts. As per the design, integer will be used in schedulers all alone. Hence all comparisons and operations can be done on integer. However we can have a label mapping for the integer which can be used while application submission, and to view in UI etc. Labels can be added as only a mappings to integer.
          Hide
          jianhe Jian He added a comment -

          I think we need to move this forward..

          Overall, I prefer using numeric priority to label-based priority because the former is simpler and more flexible if user wants to define a wide range of priorities. no extra configs. User does not need to be educated about the new mapping any time the mapping changes.

          Also, one problem is that if we refresh the priority mapping while some existing long-running jobs are already running on certain priority, how do we map the previous priority mapping range to the new priority mapping range?

          In addition, if everyone runs the application at “VERY_HIGH” priority, the “HIGH” priority, though named as “HIGH”, is not really the “HIGH” priority any more. It actually becomes the “LOWEST” priority. My point is that the importance of priority will make sense only when compared with its peers. In that sense, I think adding a utility to surface how applications are distributed across each priority so that user can reason about how to place the application on certain priority may be more useful than adding a static naming mapping to let people reason about the relative importance of priority by naming.

          Show
          jianhe Jian He added a comment - I think we need to move this forward.. Overall, I prefer using numeric priority to label-based priority because the former is simpler and more flexible if user wants to define a wide range of priorities. no extra configs. User does not need to be educated about the new mapping any time the mapping changes. Also, one problem is that if we refresh the priority mapping while some existing long-running jobs are already running on certain priority, how do we map the previous priority mapping range to the new priority mapping range? In addition, if everyone runs the application at “VERY_HIGH” priority, the “HIGH” priority, though named as “HIGH”, is not really the “HIGH” priority any more. It actually becomes the “LOWEST” priority. My point is that the importance of priority will make sense only when compared with its peers. In that sense, I think adding a utility to surface how applications are distributed across each priority so that user can reason about how to place the application on certain priority may be more useful than adding a static naming mapping to let people reason about the relative importance of priority by naming.
          Hide
          grey Lei Guo added a comment -

          Sunil G, YARN-2003 already committed, should we get the design document updated?

          Show
          grey Lei Guo added a comment - Sunil G , YARN-2003 already committed, should we get the design document updated?
          Hide
          sunilg Sunil G added a comment -

          Hi Lei Guo
          Yes. I am updating the document now. I will upload a version of doc soon.

          Show
          sunilg Sunil G added a comment - Hi Lei Guo Yes. I am updating the document now. I will upload a version of doc soon.
          Hide
          sunilg Sunil G added a comment -

          Updating design doc.
          Thank You.

          Show
          sunilg Sunil G added a comment - Updating design doc. Thank You.
          Hide
          imstefanlee stefanlee added a comment -

          Sunil G Thanks your jira, i have a doubt that "creating multiple queues and making
          users submit applications to higher priority and lower priority queues separately" in your doc,it means we create multiple queues, e.g. queue A and queue B,then label A higher priority queue ,label B lower priority queue, after that ,user can submit higher priority application to A ?but i look up your code ,it means user can submit different priority application to same queue and the queue has a default priority,the cluster has a max priority.

          Show
          imstefanlee stefanlee added a comment - Sunil G Thanks your jira, i have a doubt that "creating multiple queues and making users submit applications to higher priority and lower priority queues separately" in your doc,it means we create multiple queues, e.g. queue A and queue B,then label A higher priority queue ,label B lower priority queue, after that ,user can submit higher priority application to A ?but i look up your code ,it means user can submit different priority application to same queue and the queue has a default priority,the cluster has a max priority.

            People

            • Assignee:
              sunilg Sunil G
              Reporter:
              acmurthy Arun C Murthy
            • Votes:
              8 Vote for this issue
              Watchers:
              65 Start watching this issue

              Dates

              • Created:
                Updated:

                Development