Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6029

CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by one thread and LeafQueue#assignContainers is releasing excessive reserved container by another thread

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.8.0
    • 2.8.0
    • capacityscheduler
    • None
    • Reviewed

    Description

      When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls YarnClient#getQueueAclsInfo) just at the moment that LeafQueue#assignContainers is called and before notifying parent queue to release resource (should release a reserved container), then ResourceManager can deadlock. I found this problem on our testing environment for hadoop2.8.

      Reproduce the deadlock in chronological order

      • 1. Thread A (ResourceManager Event Processor) calls synchronized LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
      • 2. Thread B (IPC Server handler) calls synchronized ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue root), iterates over children queue acls and is blocked when calling synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of queue root.a is hold by Thread A)
      • 3. Thread A wants to inform the parent queue that a container is being completed and is blocked when invoking synchronized ParentQueue#internalReleaseResource method (the ParentQueue instance lock of queue root is hold by Thread B)

      I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be removed to solve this problem, since this method appears to not affect fields of LeafQueue instance.

      Attach patch with UT for review.

      Attachments

        1. YARN-6029-branch-2.8.001.patch
          11 kB
          Tao Yang
        2. YARN-6029.002.patch
          10 kB
          Tao Yang
        3. YARN-6029.001.patch
          9 kB
          Tao Yang
        4. deadlock.jstack
          187 kB
          Tao Yang

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Tao Yang Tao Yang Assign to me
            Tao Yang Tao Yang
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment