Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-325

RM CapacityScheduler can deadlock when getQueueInfo() is called and a container is completing

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 2.0.2-alpha, 0.23.5
    • 2.0.3-alpha, 0.23.6
    • capacityscheduler
    • None

    Description

      If a client calls getQueueInfo on a parent queue (e.g.: the root queue) and containers are completing then the RM can deadlock. getQueueInfo() locks the ParentQueue and then calls the child queues' getQueueInfo() methods in turn. However when a container completes, it locks the LeafQueue then calls back into the ParentQueue. When the two mix, it's a recipe for deadlock.

      Stacktrace to follow.

      Attachments

        1. YARN-325.patch
          7 kB
          Arun Murthy
        2. YARN-325.patch
          9 kB
          Arun Murthy
        3. YARN-325-branch23.patch
          8 kB
          Thomas Graves

        Activity

          People

            acmurthy Arun Murthy
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: