Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3091 [Umbrella] Improve and fix locks of RM scheduler
  3. YARN-4416

Deadlock due to synchronised get Methods in AbstractCSQueue

    XMLWordPrintableJSON

Details

    Description

      While debugging in eclipse came across a scenario where in i had to get to know the name of the queue but every time i tried to see the queue it was getting hung. On seeing the stack realized there was a deadlock but on analysis found out that it was only due to queue.toString() during debugging as AbstractCSQueue.getAbsoluteUsedCapacity was synchronized.
      Hence we need to ensure following :

      1. queueCapacity, resource-usage has their own read/write lock hence synchronization is not req
      2. numContainers is volatile hence synchronization is not req.
      3. read/write lock could be added to Ordering Policy. Read operations don't need synchronized. So getNumApplications doesn't need synchronized.
        (First 2 will be handled in this jira and the third will be handled in YARN-4443)

      Attachments

        1. deadlock.log
          161 kB
          Naganarasimha G R
        2. YARN-4416.v1.001.patch
          12 kB
          Naganarasimha G R
        3. YARN-4416.v1.002.patch
          13 kB
          Naganarasimha G R
        4. YARN-4416.v2.001.patch
          5 kB
          Naganarasimha G R
        5. YARN-4416.v2.002.patch
          5 kB
          Naganarasimha G R
        6. YARN-4416.v2.003.patch
          5 kB
          Naganarasimha G R

        Activity

          People

            Naganarasimha Naganarasimha G R
            Naganarasimha Naganarasimha G R
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: