Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6029

CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by one thread and LeafQueue#assignContainers is releasing excessive reserved container by another thread

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.8.0
    • Component/s: capacityscheduler
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls YarnClient#getQueueAclsInfo) just at the moment that LeafQueue#assignContainers is called and before notifying parent queue to release resource (should release a reserved container), then ResourceManager can deadlock. I found this problem on our testing environment for hadoop2.8.

      Reproduce the deadlock in chronological order

      • 1. Thread A (ResourceManager Event Processor) calls synchronized LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
      • 2. Thread B (IPC Server handler) calls synchronized ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue root), iterates over children queue acls and is blocked when calling synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of queue root.a is hold by Thread A)
      • 3. Thread A wants to inform the parent queue that a container is being completed and is blocked when invoking synchronized ParentQueue#internalReleaseResource method (the ParentQueue instance lock of queue root is hold by Thread B)

      I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be removed to solve this problem, since this method appears to not affect fields of LeafQueue instance.

      Attach patch with UT for review.

      1. deadlock.jstack
        187 kB
        Tao Yang
      2. YARN-6029.001.patch
        9 kB
        Tao Yang
      3. YARN-6029.002.patch
        10 kB
        Tao Yang
      4. YARN-6029-branch-2.8.001.patch
        11 kB
        Tao Yang

        Activity

        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Thanks for working on the patch Tao Yang, Actually whats happening is inversion of the lock order, in one flow we are holding the lock of the Leaf and trying to get the lock of the parent (completedContainer flow) and in the other we are holding the lock of the Parent and then trying to get the lock of the Leaf (getQueueUserAclInfo flow) . Better solution to this would be to have as per the 2.9/trunk where in read locks are introduced such that getQueueUserAclInfo uses read lock and completedContainer uses write lock. so that we can avoid the this inversion. This would be a big change but i am not completely sure about your fix as you are only removing from the LeafQueue

        Show
        Naganarasimha Naganarasimha G R added a comment - Thanks for working on the patch Tao Yang , Actually whats happening is inversion of the lock order, in one flow we are holding the lock of the Leaf and trying to get the lock of the parent (completedContainer flow) and in the other we are holding the lock of the Parent and then trying to get the lock of the Leaf (getQueueUserAclInfo flow) . Better solution to this would be to have as per the 2.9/trunk where in read locks are introduced such that getQueueUserAclInfo uses read lock and completedContainer uses write lock. so that we can avoid the this inversion. This would be a big change but i am not completely sure about your fix as you are only removing from the LeafQueue
        Hide
        djp Junping Du added a comment -

        Thanks Tao Yang for reporting the issue. I think this issue is valid given existing code flow and your jstack shows. For your current patch, I am a little concern that totally removing synchronized in getQueueUserAclInfo could cause other concurrent issues.
        However, Naganarasimha G R, I don't quite understand your proposed solution here - if we do exactly the same change as trunk/branch-2.9, thread A (completedContainer flow) can hold write lock on queue of Root.A and pending on write lock on queue of Root, while thread B (getQueueUserAclInfo flow) may hold read lock on queue of Root and pending on read lock on queue of Root.A. Nothing becomes better. Do I miss anything here?

        Show
        djp Junping Du added a comment - Thanks Tao Yang for reporting the issue. I think this issue is valid given existing code flow and your jstack shows. For your current patch, I am a little concern that totally removing synchronized in getQueueUserAclInfo could cause other concurrent issues. However, Naganarasimha G R , I don't quite understand your proposed solution here - if we do exactly the same change as trunk/branch-2.9, thread A (completedContainer flow) can hold write lock on queue of Root.A and pending on write lock on queue of Root, while thread B (getQueueUserAclInfo flow) may hold read lock on queue of Root and pending on read lock on queue of Root.A. Nothing becomes better. Do I miss anything here?
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks Tao Yang for reporting this issue.

        Naganarasimha G R, branch-2/trunk solves the problem after YARN-5706.

        However to fix the issue, backporting of YARN-5706 needs huge effort. I don't think it is even a plan.

        We can make some changes to LeafQueue:

        1. Remove synchronized lock of assignContainers
        2. Make changes:

        # BEGINNING of LeafQueue#assignContainers
        synchronized {
           // do stuffs
        }
        
        call-complete-containers (which locks parent) 
        
        synchronized {
           // do rest stuffs
        }
        # END of LeafQueue#assignContainers
        

        Removing synchronized will cause data inconsistency issue when fetch, and there're some other possible methods with the same pattern need change as well. (Grab LeafQueue lock while holding ParentQueue lock and do not grab CapacityScheduler's lock).

        Show
        leftnoteasy Wangda Tan added a comment - Thanks Tao Yang for reporting this issue. Naganarasimha G R , branch-2/trunk solves the problem after YARN-5706 . However to fix the issue, backporting of YARN-5706 needs huge effort. I don't think it is even a plan. We can make some changes to LeafQueue: 1. Remove synchronized lock of assignContainers 2. Make changes: # BEGINNING of LeafQueue#assignContainers synchronized { // do stuffs } call-complete-containers (which locks parent) synchronized { // do rest stuffs } # END of LeafQueue#assignContainers Removing synchronized will cause data inconsistency issue when fetch, and there're some other possible methods with the same pattern need change as well. (Grab LeafQueue lock while holding ParentQueue lock and do not grab CapacityScheduler's lock).
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Thanks Junping Du & Tan, Wangda, for correcting me, missed to realize earlier that write lock needs to wait till all read read locks are finished.
        But Tan, Wangda agree your solution solves the problem but current flow is CapacityScheduler.allocateContainersToNode -> LeafQueue.assignContainers (hold the lock on leaf) -> LeafQueue.handleExcessReservedContainer -> LeafQueue.completedContainer -> ParentQueue.completedContainer (try to get the lock here)
        Agree that we need to fix in this flow but simpler temporary correction in ParentQueue (assuming that 2.9/ trunk avoids the issue) could be

        @Override
          public List<QueueUserACLInfo> getQueueUserAclInfo(
              UserGroupInformation user) {
            List<QueueUserACLInfo> userAcls = new ArrayList<QueueUserACLInfo>();
            synchronized (this) {
              // Add parent queue acls
              userAcls.add(getUserAclInfo(user));
            }
            // Add children queue acls
            for (CSQueue child : childQueues) {
              userAcls.addAll(child.getQueueUserAclInfo(user));
            }
        
            return userAcls;
          }
        

        Thoughts ?

        Show
        Naganarasimha Naganarasimha G R added a comment - Thanks Junping Du & Tan, Wangda , for correcting me, missed to realize earlier that write lock needs to wait till all read read locks are finished. But Tan, Wangda agree your solution solves the problem but current flow is CapacityScheduler.allocateContainersToNode -> LeafQueue.assignContainers (hold the lock on leaf) -> LeafQueue.handleExcessReservedContainer -> LeafQueue.completedContainer -> ParentQueue.completedContainer (try to get the lock here) Agree that we need to fix in this flow but simpler temporary correction in ParentQueue (assuming that 2.9/ trunk avoids the issue) could be @Override public List<QueueUserACLInfo> getQueueUserAclInfo( UserGroupInformation user) { List<QueueUserACLInfo> userAcls = new ArrayList<QueueUserACLInfo>(); synchronized ( this ) { // Add parent queue acls userAcls.add(getUserAclInfo(user)); } // Add children queue acls for (CSQueue child : childQueues) { userAcls.addAll(child.getQueueUserAclInfo(user)); } return userAcls; } Thoughts ?
        Hide
        Tao Yang Tao Yang added a comment -

        Thanks Naganarasimha G R Junping Du Wangda Tan for your suggestions.
        Naganarasimha G R I think there maybe have a problem when iterating childQueues and at the same time ParentQueue#setChildQueues is called.
        Wangda Tan I agree your solution solves the problem. But I still think synchronized modifier of LeafQueue#getQueueUserAclInfo is not required. In my opinion, This method doesn't affect the data structure of LeafQueue instance (check permissions of the given user, create new QueueUserACLInfo instance then return.), and it's only called by ParentQueue#getQueueUserAclInfo. By the way, take FairScheduler as a reference, FSLeafQueue#getQueueUserAclInfo is not synchronized.
        Maybe I haven't realized the potential problem, Please correct me if I am wrong.

        Show
        Tao Yang Tao Yang added a comment - Thanks Naganarasimha G R Junping Du Wangda Tan for your suggestions. Naganarasimha G R I think there maybe have a problem when iterating childQueues and at the same time ParentQueue#setChildQueues is called. Wangda Tan I agree your solution solves the problem. But I still think synchronized modifier of LeafQueue#getQueueUserAclInfo is not required. In my opinion, This method doesn't affect the data structure of LeafQueue instance (check permissions of the given user, create new QueueUserACLInfo instance then return.), and it's only called by ParentQueue#getQueueUserAclInfo. By the way, take FairScheduler as a reference, FSLeafQueue#getQueueUserAclInfo is not synchronized. Maybe I haven't realized the potential problem, Please correct me if I am wrong.
        Hide
        gtCarrera9 Li Lu added a comment -

        I'm not a scheduler expert, but "not affecting any data structure" sounds like a wrong reason to not to synchronize. Tan, Wangda will there be any potential data races according to Java memory model[1]? If not we can safely remove those synchronize keywords. Otherwise we have to stick to it no matter how appealing it appears to be.

        [1]: http://www.cs.umd.edu/~pugh/java/memoryModel/

        Show
        gtCarrera9 Li Lu added a comment - I'm not a scheduler expert, but "not affecting any data structure" sounds like a wrong reason to not to synchronize. Tan, Wangda will there be any potential data races according to Java memory model [1] ? If not we can safely remove those synchronize keywords. Otherwise we have to stick to it no matter how appealing it appears to be. [1] : http://www.cs.umd.edu/~pugh/java/memoryModel/
        Hide
        Tao Yang Tao Yang added a comment -

        Thanks Li Lu for correcting me. There is something wrong in my words, sorry about that. I have already considered data races but not found, maybe missed somewhere.

        Show
        Tao Yang Tao Yang added a comment - Thanks Li Lu for correcting me. There is something wrong in my words, sorry about that. I have already considered data races but not found, maybe missed somewhere.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks all for comments,

        Tao Yang / Li Lu.

        Yes removing synchronized lock will not damage internal data structure. But it could cause inconsistency read data, for example, queue acl could be updated while it being updated. So I will not in favor of this solution.

        Naganarasimha G R,

        I still prefer to fix the issue in scheduling logic, there're some other similar logics like GetQueueInfo, etc. We need to identify all these issues and again it could cause inconsistency of read data when queue is being refreshed at the same time.

        Show
        leftnoteasy Wangda Tan added a comment - Thanks all for comments, Tao Yang / Li Lu . Yes removing synchronized lock will not damage internal data structure. But it could cause inconsistency read data, for example, queue acl could be updated while it being updated. So I will not in favor of this solution. Naganarasimha G R , I still prefer to fix the issue in scheduling logic, there're some other similar logics like GetQueueInfo, etc. We need to identify all these issues and again it could cause inconsistency of read data when queue is being refreshed at the same time.
        Hide
        gtCarrera9 Li Lu added a comment -

        Thanks Tan, Wangda!

        But it could cause inconsistency read data, for example, queue acl could be updated while it being updated.

        Makes sense to me. Let's keep and fix the synchronized blocks then...

        Show
        gtCarrera9 Li Lu added a comment - Thanks Tan, Wangda ! But it could cause inconsistency read data, for example, queue acl could be updated while it being updated. Makes sense to me. Let's keep and fix the synchronized blocks then...
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Thanks Tan, Wangda & Tao Yang

        I think there maybe have a problem when iterating childQueues and at the same time ParentQueue#setChildQueues is called

        Yes you are right this happens during CS initialize or reinitialize and during this time if getQueueUserAclInfo is called then some anamolies can happen as getQueueUserAclInfo is not holding lock on CS.

        But it could cause inconsistency read data, for example, queue acl could be updated while it being updated. So I will not in favor of this solution.

        Agree but IIUC based on 2.8 code its less dependent on locking of child queue as acls are updated during reinitialization all the queues at one shot, So to ensure acls are returned appropriately i presume we should be holding the lock on CS.getQueueUserAclInfo which is not happening currently in 2.8.

        I still prefer to fix the issue in scheduling logic, there're some other similar logics like GetQueueInfo, etc.

        Hmm so you are suggesting to apply 2.9/trunk's patch or reorganize the flow in CS with synchronized blocks itself ?

        Show
        Naganarasimha Naganarasimha G R added a comment - Thanks Tan, Wangda & Tao Yang I think there maybe have a problem when iterating childQueues and at the same time ParentQueue#setChildQueues is called Yes you are right this happens during CS initialize or reinitialize and during this time if getQueueUserAclInfo is called then some anamolies can happen as getQueueUserAclInfo is not holding lock on CS. But it could cause inconsistency read data, for example, queue acl could be updated while it being updated. So I will not in favor of this solution. Agree but IIUC based on 2.8 code its less dependent on locking of child queue as acls are updated during reinitialization all the queues at one shot, So to ensure acls are returned appropriately i presume we should be holding the lock on CS.getQueueUserAclInfo which is not happening currently in 2.8. I still prefer to fix the issue in scheduling logic, there're some other similar logics like GetQueueInfo, etc. Hmm so you are suggesting to apply 2.9/trunk's patch or reorganize the flow in CS with synchronized blocks itself ?
        Hide
        Tao Yang Tao Yang added a comment -

        Thanks Tan, Wangda & Naganarasimha G R !

        Agree but IIUC based on 2.8 code its less dependent on locking of child queue as acls are updated during reinitialization all the queues at one shot

        We also noticed that it doesn't hold the lock of LeafQueue instance when updating acls (CapacityScheduler#setQueueAcls) so that current logic doesn't guarantee the consistency of acls.

        So to ensure acls are returned appropriately i presume we should be holding the lock on CS.getQueueUserAclInfo which is not happening currently in 2.8.

        I'm not clear about this. Is it worth to ensure consistency of acls through reducing the efficiency of scheduler?

        Show
        Tao Yang Tao Yang added a comment - Thanks Tan, Wangda & Naganarasimha G R ! Agree but IIUC based on 2.8 code its less dependent on locking of child queue as acls are updated during reinitialization all the queues at one shot We also noticed that it doesn't hold the lock of LeafQueue instance when updating acls (CapacityScheduler#setQueueAcls) so that current logic doesn't guarantee the consistency of acls. So to ensure acls are returned appropriately i presume we should be holding the lock on CS.getQueueUserAclInfo which is not happening currently in 2.8. I'm not clear about this. Is it worth to ensure consistency of acls through reducing the efficiency of scheduler?
        Hide
        leftnoteasy Wangda Tan added a comment -

        I'm not clear about this. Is it worth to ensure consistency of acls through reducing the efficiency of scheduler?

        It gonna be inefficient, previously getQueueInfo hold scheduler lock and that causes problems.

        We also noticed that it doesn't hold the lock of LeafQueue instance when updating acls (CapacityScheduler#setQueueAcls) so that current logic doesn't guarantee the consistency of acls.

        Yeah you're correct...
        I think we could directly get queue ACL info from CS by invoking authorizer#checkPermissions, and we can have a separate lock to protect permission get/set. cc: Jian He
        But this is should be a separated patch, since we need to fix getQueueInfo as well.

        I think we can go ahead to fix locks inside LQ#assignContainers, thoughts?

        Show
        leftnoteasy Wangda Tan added a comment - I'm not clear about this. Is it worth to ensure consistency of acls through reducing the efficiency of scheduler? It gonna be inefficient, previously getQueueInfo hold scheduler lock and that causes problems. We also noticed that it doesn't hold the lock of LeafQueue instance when updating acls (CapacityScheduler#setQueueAcls) so that current logic doesn't guarantee the consistency of acls. Yeah you're correct... I think we could directly get queue ACL info from CS by invoking authorizer#checkPermissions, and we can have a separate lock to protect permission get/set. cc: Jian He But this is should be a separated patch, since we need to fix getQueueInfo as well. I think we can go ahead to fix locks inside LQ#assignContainers, thoughts?
        Hide
        leftnoteasy Wangda Tan added a comment -

        And in addition, I suggest to downgrade severity to critical to unblock 2.8, since this only happens rarely.

        Show
        leftnoteasy Wangda Tan added a comment - And in addition, I suggest to downgrade severity to critical to unblock 2.8, since this only happens rarely.
        Hide
        Tao Yang Tao Yang added a comment -

        Thanks Tan, Wangda.
        Updated priority to Critical and Attached new patch for review.
        This patch needs add indentation in synchronized block. Diff code without changing space like this:

           @Override
        -  public synchronized CSAssignment assignContainers(Resource clusterResource,
        +  public CSAssignment assignContainers(Resource clusterResource,
               FiCaSchedulerNode node, ResourceLimits currentResourceLimits,
               SchedulingMode schedulingMode) {
        +    synchronized (this) {
               updateCurrentResourceLimits(currentResourceLimits, clusterResource);
        
               if (LOG.isDebugEnabled()) {
        @@ -906,6 +907,7 @@ public synchronized CSAssignment assignContainers(Resource clusterResource,
               }
        
               setPreemptionAllowed(currentResourceLimits, node.getPartition());
        +    }
        
             // Check for reserved resources
             RMContainer reservedContainer = node.getReservedContainer();
        @@ -923,6 +925,7 @@ public synchronized CSAssignment assignContainers(Resource clusterResource,
               }
             }
        
        +    synchronized (this) {
               // if our queue cannot access this node, just return
               if (schedulingMode == SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY
                   && !accessibleToPartition(node.getPartition())) {
        @@ -1019,6 +1022,7 @@ public synchronized CSAssignment assignContainers(Resource clusterResource,
        
               return CSAssignment.NULL_ASSIGNMENT;
             }
        +  }
        
        Show
        Tao Yang Tao Yang added a comment - Thanks Tan, Wangda . Updated priority to Critical and Attached new patch for review. This patch needs add indentation in synchronized block. Diff code without changing space like this: @Override - public synchronized CSAssignment assignContainers(Resource clusterResource, + public CSAssignment assignContainers(Resource clusterResource, FiCaSchedulerNode node, ResourceLimits currentResourceLimits, SchedulingMode schedulingMode) { + synchronized ( this ) { updateCurrentResourceLimits(currentResourceLimits, clusterResource); if (LOG.isDebugEnabled()) { @@ -906,6 +907,7 @@ public synchronized CSAssignment assignContainers(Resource clusterResource, } setPreemptionAllowed(currentResourceLimits, node.getPartition()); + } // Check for reserved resources RMContainer reservedContainer = node.getReservedContainer(); @@ -923,6 +925,7 @@ public synchronized CSAssignment assignContainers(Resource clusterResource, } } + synchronized ( this ) { // if our queue cannot access this node, just return if (schedulingMode == SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY && !accessibleToPartition(node.getPartition())) { @@ -1019,6 +1022,7 @@ public synchronized CSAssignment assignContainers(Resource clusterResource, return CSAssignment.NULL_ASSIGNMENT; } + }
        Hide
        leftnoteasy Wangda Tan added a comment -

        Thanks Tao Yang, generally looks good, but we need to wrap application.assignContainers with leafqueue's synchronized lock as well. So how about changing first part of synchronized lock like this:

            FiCaSchedulerApp reservedApp = null;
            CSAssignment reservedCSAssignment = null;
        
            synchronized (this) {
              updateCurrentResourceLimits(currentResourceLimits, clusterResource);
        
              if (LOG.isDebugEnabled()) {
                LOG.debug(
                    "assignContainers: node=" + node.getNodeName() + " #applications="
                        + orderingPolicy.getNumSchedulableEntities());
              }
        
              setPreemptionAllowed(currentResourceLimits, node.getPartition());
        
              // Check for reserved resources
              RMContainer reservedContainer = node.getReservedContainer();
              if (reservedContainer != null) {
                reservedApp = getApplication(
                    reservedContainer.getApplicationAttemptId());
                synchronized (reservedApp) {
                  reservedCSAssignment = reservedApp.assignContainers(
                      clusterResource, node, currentResourceLimits, schedulingMode,
                      reservedContainer);
                }
              }
            }
        
            // Handle possible completedContainer out of synchronized lock to avoid
            // deadlock.
            if (reservedCSAssignment != null) {
              handleExcessReservedContainer(clusterResource, reservedCSAssignment, node,
                  reservedApp);
              killToPreemptContainers(clusterResource, node, reservedCSAssignment);
              return reservedCSAssignment;
            }
            
           synchronized(this) { ... }
        
        Show
        leftnoteasy Wangda Tan added a comment - Thanks Tao Yang , generally looks good, but we need to wrap application.assignContainers with leafqueue's synchronized lock as well. So how about changing first part of synchronized lock like this: FiCaSchedulerApp reservedApp = null ; CSAssignment reservedCSAssignment = null ; synchronized ( this ) { updateCurrentResourceLimits(currentResourceLimits, clusterResource); if (LOG.isDebugEnabled()) { LOG.debug( "assignContainers: node=" + node.getNodeName() + " #applications=" + orderingPolicy.getNumSchedulableEntities()); } setPreemptionAllowed(currentResourceLimits, node.getPartition()); // Check for reserved resources RMContainer reservedContainer = node.getReservedContainer(); if (reservedContainer != null ) { reservedApp = getApplication( reservedContainer.getApplicationAttemptId()); synchronized (reservedApp) { reservedCSAssignment = reservedApp.assignContainers( clusterResource, node, currentResourceLimits, schedulingMode, reservedContainer); } } } // Handle possible completedContainer out of synchronized lock to avoid // deadlock. if (reservedCSAssignment != null ) { handleExcessReservedContainer(clusterResource, reservedCSAssignment, node, reservedApp); killToPreemptContainers(clusterResource, node, reservedCSAssignment); return reservedCSAssignment; } synchronized ( this ) { ... }
        Hide
        leftnoteasy Wangda Tan added a comment -

        And the patch need to be renamed to "YARN-6029-branch-2.8.(version).patch to make Jenkins to test it with correct branch name.

        Show
        leftnoteasy Wangda Tan added a comment - And the patch need to be renamed to " YARN-6029 -branch-2.8.(version).patch to make Jenkins to test it with correct branch name.
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Thanks Tan, Wangda and Tao Yang,
        Yes agree with Tan, Wangda that this is where we need to fix as the cause for the deadlock is the reverse locking rather than ParentQueue calling the child queue (as it can be happen in other flows too). And solution looks much simpler than i initially thought !

        And in addition, I suggest to downgrade severity to critical to unblock 2.8, since this only happens rarely.

        As solution looks simple and can go in quickly, hope it can get in by 2.8 itself. Based on your previous comment, i presume you also plan the same.

        Also checked other places where getParent() is called in the LeafQueue and its call hierarchy, seems like there is no other similar issues causing deadlock.

        Show
        Naganarasimha Naganarasimha G R added a comment - Thanks Tan, Wangda and Tao Yang , Yes agree with Tan, Wangda that this is where we need to fix as the cause for the deadlock is the reverse locking rather than ParentQueue calling the child queue (as it can be happen in other flows too). And solution looks much simpler than i initially thought ! And in addition, I suggest to downgrade severity to critical to unblock 2.8, since this only happens rarely. As solution looks simple and can go in quickly, hope it can get in by 2.8 itself. Based on your previous comment, i presume you also plan the same. Also checked other places where getParent() is called in the LeafQueue and its call hierarchy, seems like there is no other similar issues causing deadlock.
        Hide
        Tao Yang Tao Yang added a comment -

        Thanks Tan, Wangda for correcting me. I missed to realize that application.assignContainers without the lock of LeafQueue instance can cause data inconsistency issue for apps (applicationAttemptMap). I'll adopt your suggestion and update the patch later.

        Show
        Tao Yang Tao Yang added a comment - Thanks Tan, Wangda for correcting me. I missed to realize that application.assignContainers without the lock of LeafQueue instance can cause data inconsistency issue for apps (applicationAttemptMap). I'll adopt your suggestion and update the patch later.
        Hide
        Tao Yang Tao Yang added a comment -

        Patch updated.
        Update Assignee to Wangda Tan since I have made little contribution for the final solution. Thanks.

        Show
        Tao Yang Tao Yang added a comment - Patch updated. Update Assignee to Wangda Tan since I have made little contribution for the final solution. Thanks.
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Thanks Tao Yang and Tan, Wangda,
        Overall i am ok with the patch but just one last clarification with above comments :

        but we need to wrap application.assignContainers with leafqueue's synchronized lock as well.

        missed to realize that application.assignContainers without the lock of LeafQueue instance can cause data inconsistency issue for apps applicationAttemptMap).

        I missed to understand that why we exactly require Leaf Queue's lock while calling application.assignContainers and neither i could understand data inconsistency issue for apps (applicationAttemptMap). Can you please elaborate ?

        Show
        Naganarasimha Naganarasimha G R added a comment - Thanks Tao Yang and Tan, Wangda , Overall i am ok with the patch but just one last clarification with above comments : but we need to wrap application.assignContainers with leafqueue's synchronized lock as well. missed to realize that application.assignContainers without the lock of LeafQueue instance can cause data inconsistency issue for apps applicationAttemptMap). I missed to understand that why we exactly require Leaf Queue's lock while calling application.assignContainers and neither i could understand data inconsistency issue for apps (applicationAttemptMap). Can you please elaborate ?
        Hide
        sunilg Sunil G added a comment -

        May be I did not understand correctly what Naganarasimha Garla has mentioned. As far I understood, application.assignContainers is been invoked from one node heartbeat and is under writelock of LeafQueue. If another node heartbeat is happening at same time which may be operating on same leaf queue, we need to hold off such invocations, correct? So i think we need that lock while operating in apps allocation as well. Pls add if I missed something.

        Show
        sunilg Sunil G added a comment - May be I did not understand correctly what Naganarasimha Garla has mentioned. As far I understood, application.assignContainers is been invoked from one node heartbeat and is under writelock of LeafQueue. If another node heartbeat is happening at same time which may be operating on same leaf queue, we need to hold off such invocations, correct? So i think we need that lock while operating in apps allocation as well. Pls add if I missed something.
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        submitting the patch to trigger jenkins

        Show
        Naganarasimha Naganarasimha G R added a comment - submitting the patch to trigger jenkins
        Hide
        Tao Yang Tao Yang added a comment -

        Hi,Naganarasimha G R
        I thought the application could be finished in another thread which is calling LQ#finishApplicationAttempt meanwhile. Perhaps app will still assign container after finished.

        Show
        Tao Yang Tao Yang added a comment - Hi, Naganarasimha G R I thought the application could be finished in another thread which is calling LQ#finishApplicationAttempt meanwhile. Perhaps app will still assign container after finished.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Just reassigned back. Tao Yang, Typically no need to reassign if you're still actively working on it, and your help to report and analysis the problem itself is the key to the final solution.

        Latest patch looks good to me as well.

        Naganarasimha Garla, I understand there could be some potential improvements that we can make for lockings of queues / apps. I was trying to keep the original logic as much as possible while I was working on YARN-3140/3141/5706 just to make it safe. Unless it becomes performance bottleneck, I don't suggest to make big changes to these logics.

        Now Jenkins has some issues, will rekick Jenkins once it back to normal.

        Show
        leftnoteasy Wangda Tan added a comment - Just reassigned back. Tao Yang , Typically no need to reassign if you're still actively working on it, and your help to report and analysis the problem itself is the key to the final solution. Latest patch looks good to me as well. Naganarasimha Garla , I understand there could be some potential improvements that we can make for lockings of queues / apps. I was trying to keep the original logic as much as possible while I was working on YARN-3140 /3141/5706 just to make it safe. Unless it becomes performance bottleneck, I don't suggest to make big changes to these logics. Now Jenkins has some issues, will rekick Jenkins once it back to normal.
        Hide
        Naganarasimha Naganarasimha G R added a comment -

        Agree with you Tan, Wangda, lesser the modification now is better and agree that trunk has optimizations for the same already.

        Show
        Naganarasimha Naganarasimha G R added a comment - Agree with you Tan, Wangda , lesser the modification now is better and agree that trunk has optimizations for the same already.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 14m 13s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 8m 45s branch-2.8 passed
        +1 compile 0m 28s branch-2.8 passed with JDK v1.8.0_111
        +1 compile 0m 31s branch-2.8 passed with JDK v1.7.0_121
        +1 checkstyle 0m 19s branch-2.8 passed
        +1 mvnsite 0m 38s branch-2.8 passed
        +1 mvneclipse 0m 18s branch-2.8 passed
        +1 findbugs 1m 11s branch-2.8 passed
        +1 javadoc 0m 20s branch-2.8 passed with JDK v1.8.0_111
        +1 javadoc 0m 23s branch-2.8 passed with JDK v1.7.0_121
        +1 mvninstall 0m 30s the patch passed
        +1 compile 0m 25s the patch passed with JDK v1.8.0_111
        +1 javac 0m 25s the patch passed
        +1 compile 0m 29s the patch passed with JDK v1.7.0_121
        +1 javac 0m 29s the patch passed
        -0 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 4 new + 52 unchanged - 2 fixed = 56 total (was 54)
        +1 mvnsite 0m 34s the patch passed
        +1 mvneclipse 0m 13s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 17s the patch passed
        +1 javadoc 0m 17s the patch passed with JDK v1.8.0_111
        +1 javadoc 0m 20s the patch passed with JDK v1.7.0_121
        -1 unit 73m 57s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_121.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        180m 34s



        Reason Tests
        JDK v1.8.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
          hadoop.yarn.server.resourcemanager.TestAMAuthorization
        JDK v1.7.0_121 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
          hadoop.yarn.server.resourcemanager.TestAMAuthorization



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:5af2af1
        JIRA Issue YARN-6029
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845143/YARN-6029-branch-2.8.001.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux df9293be45a2 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision branch-2.8 / d61af93
        Default Java 1.7.0_121
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14530/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/14530/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_121.txt
        JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14530/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/14530/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 14m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 45s branch-2.8 passed +1 compile 0m 28s branch-2.8 passed with JDK v1.8.0_111 +1 compile 0m 31s branch-2.8 passed with JDK v1.7.0_121 +1 checkstyle 0m 19s branch-2.8 passed +1 mvnsite 0m 38s branch-2.8 passed +1 mvneclipse 0m 18s branch-2.8 passed +1 findbugs 1m 11s branch-2.8 passed +1 javadoc 0m 20s branch-2.8 passed with JDK v1.8.0_111 +1 javadoc 0m 23s branch-2.8 passed with JDK v1.7.0_121 +1 mvninstall 0m 30s the patch passed +1 compile 0m 25s the patch passed with JDK v1.8.0_111 +1 javac 0m 25s the patch passed +1 compile 0m 29s the patch passed with JDK v1.7.0_121 +1 javac 0m 29s the patch passed -0 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 4 new + 52 unchanged - 2 fixed = 56 total (was 54) +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 17s the patch passed +1 javadoc 0m 17s the patch passed with JDK v1.8.0_111 +1 javadoc 0m 20s the patch passed with JDK v1.7.0_121 -1 unit 73m 57s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_121. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 180m 34s Reason Tests JDK v1.8.0_111 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization JDK v1.7.0_121 Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:5af2af1 JIRA Issue YARN-6029 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12845143/YARN-6029-branch-2.8.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux df9293be45a2 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.8 / d61af93 Default Java 1.7.0_121 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14530/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14530/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_121.txt JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14530/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/14530/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Committing ...

        Show
        leftnoteasy Wangda Tan added a comment - Committing ...
        Hide
        leftnoteasy Wangda Tan added a comment -

        Committed to branch-2.8, thanks Tao Yang for working on the patch and thanks reviews from Naganarasimha G R/Li Lu/Junping Du!

        Show
        leftnoteasy Wangda Tan added a comment - Committed to branch-2.8, thanks Tao Yang for working on the patch and thanks reviews from Naganarasimha G R / Li Lu / Junping Du !

          People

          • Assignee:
            Tao Yang Tao Yang
            Reporter:
            Tao Yang Tao Yang
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development