Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6029

CapacityScheduler deadlock when ParentQueue#getQueueUserAclInfo is called by one thread and LeafQueue#assignContainers is releasing excessive reserved container by another thread

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.8.0
    • 2.8.0
    • capacityscheduler
    • None
    • Reviewed

    Description

      When ParentQueue#getQueueUserAclInfo is called (e.g. a client calls YarnClient#getQueueAclsInfo) just at the moment that LeafQueue#assignContainers is called and before notifying parent queue to release resource (should release a reserved container), then ResourceManager can deadlock. I found this problem on our testing environment for hadoop2.8.

      Reproduce the deadlock in chronological order

      • 1. Thread A (ResourceManager Event Processor) calls synchronized LeafQueue#assignContainers (got LeafQueue instance lock of queue root.a)
      • 2. Thread B (IPC Server handler) calls synchronized ParentQueue#getQueueUserAclInfo (got ParentQueue instance lock of queue root), iterates over children queue acls and is blocked when calling synchronized LeafQueue#getQueueUserAclInfo (the LeafQueue instance lock of queue root.a is hold by Thread A)
      • 3. Thread A wants to inform the parent queue that a container is being completed and is blocked when invoking synchronized ParentQueue#internalReleaseResource method (the ParentQueue instance lock of queue root is hold by Thread B)

      I think the synchronized modifier of LeafQueue#getQueueUserAclInfo can be removed to solve this problem, since this method appears to not affect fields of LeafQueue instance.

      Attach patch with UT for review.

      Attachments

        1. YARN-6029-branch-2.8.001.patch
          11 kB
          Tao Yang
        2. YARN-6029.002.patch
          10 kB
          Tao Yang
        3. YARN-6029.001.patch
          9 kB
          Tao Yang
        4. deadlock.jstack
          187 kB
          Tao Yang

        Activity

          People

            Tao Yang Tao Yang
            Tao Yang Tao Yang
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: