Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.2.0
    • Fix Version/s: None
    • Component/s: scheduler
    • Labels:
      None

      Description

      This is an umbrella JIRA for work needed to allow moving YARN applications from one queue to another. The work will consist of additions in the command line options, additions in the client RM protocol, and changes in the schedulers to support this.

      I have a picture of how this should function in the Fair Scheduler, but I'm not familiar enough with the Capacity Scheduler for the same there. Ultimately, the decision to whether an application can be moved should go down to the scheduler - some schedulers may wish not to support this at all. However, schedulers that do support it should share some common semantics around ACLs and what happens to running containers.

      Here is how I see the general semantics working out:

      • A move request is issued by the client. After it gets past ACLs, the scheduler checks whether executing the move will violate any constraints. For the Fair Scheduler, these would be queue maxRunningApps and queue maxResources constraints
      • All running containers are transferred from the old queue to the new queue
      • All outstanding requests are transferred from the old queue to the new queue

      Here is I see the ACLs of this working out:

      • To move an app from a queue a user must have modify access on the app or administer access on the queue
      • To move an app to a queue a user must have submit access on the queue or administer access on the queue

        Activity

        Hide
        bpodgursky Ben Podgursky added a comment -

        Hi,

        Are there any plans to let users move jobs between queues via the web UI, like with the MR1 fair scheduler? We found this feature very useful.

        Show
        bpodgursky Ben Podgursky added a comment - Hi, Are there any plans to let users move jobs between queues via the web UI, like with the MR1 fair scheduler? We found this feature very useful.
        Hide
        kkambatl Karthik Kambatla (Inactive) added a comment -

        None of the patches introduce any Public-Stable APIs. I think it is reasonable to merge them to branch-2. +1. Thanks Sandy.

        Show
        kkambatl Karthik Kambatla (Inactive) added a comment - None of the patches introduce any Public-Stable APIs. I think it is reasonable to merge them to branch-2. +1. Thanks Sandy.
        Hide
        hitesh Hitesh Shah added a comment -

        Sandy Ryza Just to be clear, I have not reviewed the code as such so please do not consider my comment as a +1 if that is required for the merge. I am assuming there are others who are more familiar with these changes and have reviewed/+1'ed it for the merge.

        Show
        hitesh Hitesh Shah added a comment - Sandy Ryza Just to be clear, I have not reviewed the code as such so please do not consider my comment as a +1 if that is required for the merge. I am assuming there are others who are more familiar with these changes and have reviewed/+1'ed it for the merge.
        Hide
        sandyr Sandy Ryza added a comment -

        From an API point of view, there should be a way for the application at run-time/registration time to find out what features are supported or not supported by the currently configured scheduler in the RM.

        That makes total sense to me.

        Now, for moving apps across schedulers - given that it is a client only feature and there is no changes required in an application, my previous comment's argument does not hold for this feature.

        Great. Will go forward with the merge to branch-2.

        Show
        sandyr Sandy Ryza added a comment - From an API point of view, there should be a way for the application at run-time/registration time to find out what features are supported or not supported by the currently configured scheduler in the RM. That makes total sense to me. Now, for moving apps across schedulers - given that it is a client only feature and there is no changes required in an application, my previous comment's argument does not hold for this feature. Great. Will go forward with the merge to branch-2.
        Hide
        hitesh Hitesh Shah added a comment -

        The way I see it - it is and should be ok for different schedulers to support a different set of features. The behavior should be the same across all schedulers if the feature is supported.

        @Karthik, I dont believe it is right to do a half-baked approach regardless of which scheduler builds the feature first. The main concern is for an app developer and how a new feature or the lack of it affects someone writing their own application.

        From an API point of view, there should be a way for the application at run-time/registration time to find out what features are supported or not supported by the currently configured scheduler in the RM. This allows for applications to be written correctly and to make the necessary changes in the calls to the RM to work around advanced vs primitive schedulers. If schedulers are going to differ in terms of feature support, then an API to find out whether a feature is supported or not should be considered a blocker for a release. I believe this only holds for APIs affecting application masters for now but there may be situations where a client could be affected too.

        Now, for moving apps across schedulers - given that it is a client only feature and there is no changes required in an application, my previous comment's argument does not hold for this feature. ( I assume that Fifo and CS will both throw an appropriate UnsupportedOperationException on a move call? )

        Show
        hitesh Hitesh Shah added a comment - The way I see it - it is and should be ok for different schedulers to support a different set of features. The behavior should be the same across all schedulers if the feature is supported. @Karthik, I dont believe it is right to do a half-baked approach regardless of which scheduler builds the feature first. The main concern is for an app developer and how a new feature or the lack of it affects someone writing their own application. From an API point of view, there should be a way for the application at run-time/registration time to find out what features are supported or not supported by the currently configured scheduler in the RM. This allows for applications to be written correctly and to make the necessary changes in the calls to the RM to work around advanced vs primitive schedulers. If schedulers are going to differ in terms of feature support, then an API to find out whether a feature is supported or not should be considered a blocker for a release. I believe this only holds for APIs affecting application masters for now but there may be situations where a client could be affected too. Now, for moving apps across schedulers - given that it is a client only feature and there is no changes required in an application, my previous comment's argument does not hold for this feature. ( I assume that Fifo and CS will both throw an appropriate UnsupportedOperationException on a move call? )
        Hide
        kkambatl Karthik Kambatla (Inactive) added a comment -

        I agree with Sandy here. If it were a scheduler feature that is fundamental or would bring the RM down if a scheduler doesn't implement, I think it would be a good idea to not ship/ merge to branch-2. However, moving applications between queues is benign. If the user tries to move using a scheduler that doesn't support move yet, it would only say moving apps is not supported.

        It might not be the best analogy, but I think supporting multiple resources in the scheduler is similar. CS supported it first, but it was okay for users to use related configs while using FS as well; it just didn't take multiple resources into account.

        Show
        kkambatl Karthik Kambatla (Inactive) added a comment - I agree with Sandy here. If it were a scheduler feature that is fundamental or would bring the RM down if a scheduler doesn't implement, I think it would be a good idea to not ship/ merge to branch-2. However, moving applications between queues is benign. If the user tries to move using a scheduler that doesn't support move yet, it would only say moving apps is not supported. It might not be the best analogy, but I think supporting multiple resources in the scheduler is similar. CS supported it first, but it was okay for users to use related configs while using FS as well; it just didn't take multiple resources into account.
        Hide
        sandyr Sandy Ryza added a comment -

        Sorry if I'm jumping hard on this one. It's something I've thought a fair bit about when adding scheduler features to YARN. We're all busy, and I want to be in a position where a scheduler can advance even when those familiar with the other scheduler aren't able to contribute dev time to immediately implement the same feature. This obviously shouldn't be at the expense of adding huge amounts of complexity for users. But I think in this case, the concept of "If I go with X scheduler, I will be able to move jobs between queues, and if I go with Y scheduler, this is not supported yet" is not super difficult for administrators to digest. Much less difficult to digest than many of the other differences between schedulers.

        Show
        sandyr Sandy Ryza added a comment - Sorry if I'm jumping hard on this one. It's something I've thought a fair bit about when adding scheduler features to YARN. We're all busy, and I want to be in a position where a scheduler can advance even when those familiar with the other scheduler aren't able to contribute dev time to immediately implement the same feature. This obviously shouldn't be at the expense of adding huge amounts of complexity for users. But I think in this case, the concept of "If I go with X scheduler, I will be able to move jobs between queues, and if I go with Y scheduler, this is not supported yet" is not super difficult for administrators to digest. Much less difficult to digest than many of the other differences between schedulers.
        Hide
        sandyr Sandy Ryza added a comment -

        While I would argue that preemption is actually more impactful to users than the option to move apps, as they need to make their applications resilient to it, it's not the only example. Strict locality and preemption warnings went into the Fair Scheduler before the Capacity Scheduler, blacklisting went into the Capacity Scheduler first. The users for moving applications between queues are cluster administrators, who already need to be aware of the operational differences between different schedulers. There are many reasons why moving an application between queues may fail, some of them internal to the scheduler, such as a violation of resource configurations, some of them external, such as an application being in a particular state. Using a scheduler that doesn't support it is just another example.

        While having a consistent experience across schedulers is nice, and we should be very careful to keep the semantics the same when multiple schedulers support it, I think blocking it in one scheduler because the other doesn't support it is an unnecessary drag on the pace of development.

        Show
        sandyr Sandy Ryza added a comment - While I would argue that preemption is actually more impactful to users than the option to move apps, as they need to make their applications resilient to it, it's not the only example. Strict locality and preemption warnings went into the Fair Scheduler before the Capacity Scheduler, blacklisting went into the Capacity Scheduler first. The users for moving applications between queues are cluster administrators, who already need to be aware of the operational differences between different schedulers. There are many reasons why moving an application between queues may fail, some of them internal to the scheduler, such as a violation of resource configurations, some of them external, such as an application being in a particular state. Using a scheduler that doesn't support it is just another example. While having a consistent experience across schedulers is nice, and we should be very careful to keep the semantics the same when multiple schedulers support it, I think blocking it in one scheduler because the other doesn't support it is an unnecessary drag on the pace of development.
        Hide
        hitesh Hitesh Shah added a comment -

        Pre-emption is an internal implementation choice of how a scheduler enforces its policies. This case is more towards supporting a feature that users can use - but users would then be forced to be aware of what scheduler is being used to decide whether the feature is supported or not.

        It is good that schedulers are getting enhanced however it seems wrong from a YARN user or app-developer's point of view if the user has to do things differently just because a scheduler does/does not support a feature. There should be a way to define an API for the app developer which ensures that there is a strict contract on what will be done. There could be some way to define implementation-specific APIs too but those should be clearly called out that the feature/api is not supported by all schedulers.

        Show
        hitesh Hitesh Shah added a comment - Pre-emption is an internal implementation choice of how a scheduler enforces its policies. This case is more towards supporting a feature that users can use - but users would then be forced to be aware of what scheduler is being used to decide whether the feature is supported or not. It is good that schedulers are getting enhanced however it seems wrong from a YARN user or app-developer's point of view if the user has to do things differently just because a scheduler does/does not support a feature. There should be a way to define an API for the app developer which ensures that there is a strict contract on what will be done. There could be some way to define implementation-specific APIs too but those should be clearly called out that the feature/api is not supported by all schedulers.
        Hide
        sandyr Sandy Ryza added a comment -

        The changes for the Capacity Scheduler aren't yet done. In the past, we've never used this as a basis for not including a feature. For example, preemption was supported in YARN and the Fair Scheduler long before it reached the Capacity Scheduler. The Capacity Scheduler is default because we can't choose both as the default, but the Fair Scheduler is recommended / first class as well. Even if we decided to change our stance there, I don't think that would be a basis for not merging what we have so far into branch-2.

        Show
        sandyr Sandy Ryza added a comment - The changes for the Capacity Scheduler aren't yet done. In the past, we've never used this as a basis for not including a feature. For example, preemption was supported in YARN and the Fair Scheduler long before it reached the Capacity Scheduler. The Capacity Scheduler is default because we can't choose both as the default, but the Fair Scheduler is recommended / first class as well. Even if we decided to change our stance there, I don't think that would be a basis for not merging what we have so far into branch-2.
        Hide
        hitesh Hitesh Shah added a comment -

        Sandy Ryza Are the changes for capacity scheduler also done? If not, I am not sure how we can add a feature where the default scheduler does not support it.

        Show
        hitesh Hitesh Shah added a comment - Sandy Ryza Are the changes for capacity scheduler also done? If not, I am not sure how we can add a feature where the default scheduler does not support it.
        Hide
        sandyr Sandy Ryza added a comment -

        With the bulk of this implemented and tested, I'm planning to merge this to branch-2. Will do so tomorrow unless there are objections.

        Show
        sandyr Sandy Ryza added a comment - With the bulk of this implemented and tested, I'm planning to merge this to branch-2. Will do so tomorrow unless there are objections.
        Hide
        sandyr Sandy Ryza added a comment -

        Good point, Bikas. Filed YARN-1558 for this.

        Show
        sandyr Sandy Ryza added a comment - Good point, Bikas. Filed YARN-1558 for this.
        Hide
        bikassaha Bikas Saha added a comment -

        The app submission context saved in the store would need to be updated with the new queue information, after the scheduler has accepted the move but before the user gets notified about move completion.

        Show
        bikassaha Bikas Saha added a comment - The app submission context saved in the store would need to be updated with the new queue information, after the scheduler has accepted the move but before the user gets notified about move completion.
        Hide
        sandyr Sandy Ryza added a comment -

        We have to touch RMApp etc before hitting scheduler as state in RM is partitioned inside and outside scheduler.

        Sorry, I wasn't clear - definitely agree we need to go through RM app, just was wondering whether to do it with events or synchronously. Thanks for the heads up on the race condition - will watch out for that.

        The paradigm followed is a multi-phase request

        An issue with doing a multi-phase request is that, if the move fails, we would like to return an appropriate error message with the reason to the client, and the reason can go as far down as the scheduler. We could give the client a request ID that they could come back with to find the result, but that kind of seems like overkill to me. While async/multi-phase requests 100% make sense to me in situations like the AMRM protocol where requests come in all the time, moves will normally be human-initiated requests that come with very low frequency. I'll write the code with events, which will allow us to take either the blocking (with a Future) or non-blocking approach.

        Show
        sandyr Sandy Ryza added a comment - We have to touch RMApp etc before hitting scheduler as state in RM is partitioned inside and outside scheduler. Sorry, I wasn't clear - definitely agree we need to go through RM app, just was wondering whether to do it with events or synchronously. Thanks for the heads up on the race condition - will watch out for that. The paradigm followed is a multi-phase request An issue with doing a multi-phase request is that, if the move fails, we would like to return an appropriate error message with the reason to the client, and the reason can go as far down as the scheduler. We could give the client a request ID that they could come back with to find the result, but that kind of seems like overkill to me. While async/multi-phase requests 100% make sense to me in situations like the AMRM protocol where requests come in all the time, moves will normally be human-initiated requests that come with very low frequency. I'll write the code with events, which will allow us to take either the blocking (with a Future) or non-blocking approach.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Tx for laying out the use-case.

        I don't see a reason that we shouldn't be able to move an app that has been submitted, but not accepted, or that is very close to completion.

        There is a race condition when scheduler is in the process of accepting an app to a queue and a corresponding queue-move request comes in. Like you said, we just need to be careful.

        Ha, your first question is related to the above.

        We have to touch RMApp etc before hitting scheduler as state in RM is partitioned inside and outside scheduler. So we may not be able to go directly to the scheduler. Typically, we don't block RPCs either - bad in multi-tenant clusters. The paradigm followed is a multi-phase request - submitApp and poll for its status, kill-app and poll for its status (YARN-1446). You could do something like that here too.

        The other race condition that just occurred to me is apps and app-attempts. RM may be in the process of creating a new app-attempt while the move request comes in. Scheduler only knows about App-Attempt today - that could be a hard issue. Jian is trying to fix it via YARN-1493. You may need to wait for that to avoid those race conditions.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Tx for laying out the use-case. I don't see a reason that we shouldn't be able to move an app that has been submitted, but not accepted, or that is very close to completion. There is a race condition when scheduler is in the process of accepting an app to a queue and a corresponding queue-move request comes in. Like you said, we just need to be careful. Ha, your first question is related to the above. We have to touch RMApp etc before hitting scheduler as state in RM is partitioned inside and outside scheduler. So we may not be able to go directly to the scheduler. Typically, we don't block RPCs either - bad in multi-tenant clusters. The paradigm followed is a multi-phase request - submitApp and poll for its status, kill-app and poll for its status ( YARN-1446 ). You could do something like that here too. The other race condition that just occurred to me is apps and app-attempts. RM may be in the process of creating a new app-attempt while the move request comes in. Scheduler only knows about App-Attempt today - that could be a hard issue. Jian is trying to fix it via YARN-1493 . You may need to wait for that to avoid those race conditions.
        Hide
        sandyr Sandy Ryza added a comment -

        Also, a coding question you can maybe provide me guidance on?

        Ideally, we would like to return the RPC with whether or not the operation succeeded. However, we need to go down through the app, app attempt, and finally, scheduler to determine this. We could achieve this in a couple of ways:

        • Use an aync event at each level as is the convention (e.g. as is done for killing an application). Have the call in ClientRMService block and wait for things to get sorted out lower down before returning. Not entirely sure what we would wait for because the ClientRMService itself doesn't receive events. A Future might be clean.
        • Bypass events and go synchronously through to the scheduler.
          Is one of these preferred? Is there a third path I'm missing?
        Show
        sandyr Sandy Ryza added a comment - Also, a coding question you can maybe provide me guidance on? Ideally, we would like to return the RPC with whether or not the operation succeeded. However, we need to go down through the app, app attempt, and finally, scheduler to determine this. We could achieve this in a couple of ways: Use an aync event at each level as is the convention (e.g. as is done for killing an application). Have the call in ClientRMService block and wait for things to get sorted out lower down before returning. Not entirely sure what we would wait for because the ClientRMService itself doesn't receive events. A Future might be clean. Bypass events and go synchronously through to the scheduler. Is one of these preferred? Is there a third path I'm missing?
        Hide
        sandyr Sandy Ryza added a comment -

        Thanks for taking a look Vinod.

        Any specific use-case? Example where it can be used? To justify this isn't feature creep.

        Yeah, we've seen requests for this a few times. I think the most common scenario is that someone experiences job slowly because of the queue that it's in and the job needs to be placed in a queue where it can complete more quickly. This can occur because it's taking longer than expected and a deadline is approaching, the original queue is fuller than expected, the job was submitted incorrectly in the first place but has made some progress, or for a number of other reasons.

        What happens when scheduling-constraints are violated? The client will just get an error? It kind of depends on the type of scheduling constraint.

        Not sure how this should play out for the Capacity Scheduler, but for the Fair Scheduler constraints I mentioned in the description I think the client should get an error. I suppose another option would be to kill containers until the constraints would be satisfied, but I think this is a lot more work and not clearly better behavior.

        Who initiates the move any regular user or just admins?

        My opinion is any regular user, within ACLs. I.e. if I could kill my job and resubmit it to a different queue, I should be able to move it.

        Only running apps can be moved?

        I don't see a reason that we shouldn't be able to move an app that has been submitted, but not accepted, or that is very close to completion. In some cases we may not need to touch the scheduler. There are definitely race conditions we need to be careful of here.

        Apps may be in the process of submitting new requests. What happens to them? I guess queue-move and new-requests should be synchronized.

        Right.

        Show
        sandyr Sandy Ryza added a comment - Thanks for taking a look Vinod. Any specific use-case? Example where it can be used? To justify this isn't feature creep. Yeah, we've seen requests for this a few times. I think the most common scenario is that someone experiences job slowly because of the queue that it's in and the job needs to be placed in a queue where it can complete more quickly. This can occur because it's taking longer than expected and a deadline is approaching, the original queue is fuller than expected, the job was submitted incorrectly in the first place but has made some progress, or for a number of other reasons. What happens when scheduling-constraints are violated? The client will just get an error? It kind of depends on the type of scheduling constraint. Not sure how this should play out for the Capacity Scheduler, but for the Fair Scheduler constraints I mentioned in the description I think the client should get an error. I suppose another option would be to kill containers until the constraints would be satisfied, but I think this is a lot more work and not clearly better behavior. Who initiates the move any regular user or just admins? My opinion is any regular user, within ACLs. I.e. if I could kill my job and resubmit it to a different queue, I should be able to move it. Only running apps can be moved? I don't see a reason that we shouldn't be able to move an app that has been submitted, but not accepted, or that is very close to completion. In some cases we may not need to touch the scheduler. There are definitely race conditions we need to be careful of here. Apps may be in the process of submitting new requests. What happens to them? I guess queue-move and new-requests should be synchronized. Right.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Hi Sandy, some questions and quick thoughts on this ticket:

        • Any specific use-case? Example where it can be used? To justify this isn't feature creep.
        • What happens when scheduling-constraints are violated? The client will just get an error? It kind of depends on the type of scheduling constraint.
        • Who initiates the move any regular user or just admins? Given your description of ACLs, seems like any one.
        • Only running apps can be moved? There are races w.r.t apps that are submitted but not accepted and close-to-completion apps.
        • The ACLs choice seems straightforward and makes sense.

        There is some non-trivial stuff that needs ironing out, outside of schedulers.

        • While the move happens,
          • Apps may be in the process of submitting new requests. What happens to them? I guess queue-move and new-requests should be synchronized.
          • Preemption monitors will need to be notified. As they kind of know a lot about schedulers but sit outside the schedulers.
          • there will be a potential wild-change in the head-room for the application.
        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Hi Sandy, some questions and quick thoughts on this ticket: Any specific use-case? Example where it can be used? To justify this isn't feature creep. What happens when scheduling-constraints are violated? The client will just get an error? It kind of depends on the type of scheduling constraint. Who initiates the move any regular user or just admins? Given your description of ACLs, seems like any one. Only running apps can be moved? There are races w.r.t apps that are submitted but not accepted and close-to-completion apps. The ACLs choice seems straightforward and makes sense. There is some non-trivial stuff that needs ironing out, outside of schedulers. While the move happens, Apps may be in the process of submitting new requests. What happens to them? I guess queue-move and new-requests should be synchronized. Preemption monitors will need to be notified. As they kind of know a lot about schedulers but sit outside the schedulers. there will be a potential wild-change in the head-room for the application.

          People

          • Assignee:
            sandyr Sandy Ryza
            Reporter:
            sandyr Sandy Ryza
          • Votes:
            0 Vote for this issue
            Watchers:
            21 Start watching this issue

            Dates

            • Created:
              Updated:

              Development