Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3297

Deadlock scenario in AM during ShuffleVertexManager auto reduce

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.7.2, 0.9.0, 0.8.4
    • None
    • None

    Description

      Here is what's happening in the attached thread dump.

      App Pool thread #9 does the auto reduce on V2 and initializes the new edge manager, it holds the V2 write lock and wants read lock of source vertex V1.

      At the same time, another App Pool thread #2 schedules a task of V1 and gets the output spec, so it holds the V1 read lock and wants V2 read lock.

      Also, dispatcher thread wants the V1 write lock to begin the state machine transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock.

      This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, and #9 blocks #2.

      There is no problem with ReadWriteLock behavior in this case. Please see this java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565.

      Attachments

        1. am_log
          3.45 MB
          Zhiyuan Yang
        2. TEZ-3297.1.patch
          6 kB
          Rajesh Balamohan
        3. TEZ-3297.2.branch-0.7.patch
          7 kB
          Rajesh Balamohan
        4. TEZ-3297.2.patch
          7 kB
          Rajesh Balamohan
        5. thread_dump
          81 kB
          Zhiyuan Yang

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            zhiyuany Zhiyuan Yang
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: