Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3297

Deadlock scenario in AM during ShuffleVertexManager auto reduce

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.7.2, 0.9.0, 0.8.4
    • None
    • None

    Description

      Here is what's happening in the attached thread dump.

      App Pool thread #9 does the auto reduce on V2 and initializes the new edge manager, it holds the V2 write lock and wants read lock of source vertex V1.

      At the same time, another App Pool thread #2 schedules a task of V1 and gets the output spec, so it holds the V1 read lock and wants V2 read lock.

      Also, dispatcher thread wants the V1 write lock to begin the state machine transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock.

      This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, and #9 blocks #2.

      There is no problem with ReadWriteLock behavior in this case. Please see this java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565.

      Attachments

        1. am_log
          3.45 MB
          Zhiyuan Yang
        2. TEZ-3297.1.patch
          6 kB
          Rajesh Balamohan
        3. TEZ-3297.2.branch-0.7.patch
          7 kB
          Rajesh Balamohan
        4. TEZ-3297.2.patch
          7 kB
          Rajesh Balamohan
        5. thread_dump
          81 kB
          Zhiyuan Yang

        Activity

          Attaching .1 patch for review.

          rajesh.balamohan Rajesh Balamohan added a comment - Attaching .1 patch for review.
          sseth Siddharth Seth added a comment -

          This looks good to me. The readLock to obtain information from the Input/Output vertices is obtained outside of the read/write lock of the same vertex. +1.
          Couple of things which may be worth adding to the patch.

          • Some commentary around why this change is being made.
          • Potentially some asserts (maybe Preconditions) around the state of the locks during calls to upstream / downstream vertices in these two methods.

          I believe this fixes things for now. Would be useful to look at how to make this simpler in the future. One potential option is to have all write transitions on a vertex go through the central dispatcher (instead of direct calls in from the VertexManager). This should work fairly well in terms of restricting the number of write request.

          Going through the code a little more, potential issues unrelated to this patch
          I think any method which invokes a method on the EdgeManager which invokes the actual Edge, from within a lock, is prone to such deadlocks, since the Edges itself have access to 'getSourceVertexNumTasks' and 'getDesintationVertexNumTasks'.
          Another potential issue is with handleInitEvent which accesses upstream totalTasks from within a writeLock on the current vertex.
          setInputVertices / setOutputVertices should acquire a writeLock for visibility.

          sseth Siddharth Seth added a comment - This looks good to me. The readLock to obtain information from the Input/Output vertices is obtained outside of the read/write lock of the same vertex. +1. Couple of things which may be worth adding to the patch. Some commentary around why this change is being made. Potentially some asserts (maybe Preconditions) around the state of the locks during calls to upstream / downstream vertices in these two methods. I believe this fixes things for now. Would be useful to look at how to make this simpler in the future. One potential option is to have all write transitions on a vertex go through the central dispatcher (instead of direct calls in from the VertexManager). This should work fairly well in terms of restricting the number of write request. Going through the code a little more, potential issues unrelated to this patch I think any method which invokes a method on the EdgeManager which invokes the actual Edge, from within a lock, is prone to such deadlocks, since the Edges itself have access to 'getSourceVertexNumTasks' and 'getDesintationVertexNumTasks'. Another potential issue is with handleInitEvent which accesses upstream totalTasks from within a writeLock on the current vertex. setInputVertices / setOutputVertices should acquire a writeLock for visibility.

          Thanks for the review sseth.

          Added comments + added writeLock in setInputVerteices/setOuputVertices.
          Not sure if we can add asserts currently, as it shouldn't hide any other potential issues.

          rajesh.balamohan Rajesh Balamohan added a comment - Thanks for the review sseth . Added comments + added writeLock in setInputVerteices/setOuputVertices. Not sure if we can add asserts currently, as it shouldn't hide any other potential issues.
          tezqa TezQA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12809375/TEZ-3297.2.patch
          against master revision 2c21285.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in :
          org.apache.tez.dag.app.rm.TestContainerReuse

          Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1786//testReport/
          Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1786//console

          This message is automatically generated.

          tezqa TezQA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12809375/TEZ-3297.2.patch against master revision 2c21285. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in : org.apache.tez.dag.app.rm.TestContainerReuse Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1786//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1786//console This message is automatically generated.
          hitesh Hitesh Shah added a comment -

          \cc jlowe jeagles in case this has been seen before

          hitesh Hitesh Shah added a comment - \cc jlowe jeagles in case this has been seen before

          In case this is applicable to branch-0.7, can we get this checked into that line as well?

          jeagles Jonathan Turner Eagles added a comment - In case this is applicable to branch-0.7, can we get this checked into that line as well?
          bikassaha Bikas Saha added a comment -

          I am not sure we can simply remove the lock since it may affect visibility. Also the assumption that task count wont change may be inaccurate in the future. With progressive creation of splits task count may change with time. Similarly input output specs are theoretically pluggable and different per task. Lets be cautious wrt these future features when fixing this issue else we may forget about it later on. A deadlock could sometimes be better than wrong results

          bikassaha Bikas Saha added a comment - I am not sure we can simply remove the lock since it may affect visibility. Also the assumption that task count wont change may be inaccurate in the future. With progressive creation of splits task count may change with time. Similarly input output specs are theoretically pluggable and different per task. Lets be cautious wrt these future features when fixing this issue else we may forget about it later on. A deadlock could sometimes be better than wrong results
          sseth Siddharth Seth added a comment -

          bikassaha - the patch re-orders locks, and obtains them when required. What it's doing is making calls to upstream / downstream vertices after releasing the lock on the current vertex. Visibility is taken care off. For root inputs, TaskSpecs continue to be per task.
          Is there a specific issue which you see with the patch ?

          sseth Siddharth Seth added a comment - bikassaha - the patch re-orders locks, and obtains them when required. What it's doing is making calls to upstream / downstream vertices after releasing the lock on the current vertex. Visibility is taken care off. For root inputs, TaskSpecs continue to be per task. Is there a specific issue which you see with the patch ?
          bikassaha Bikas Saha added a comment -

          looking at the code further, looks like the crucial change is not holding own vertex lock while trying to read src/dest vertex lock. that makes sense and seems like a lock ordering issue waiting to happen. Perhaps a quick scan of such nested locking is in order in case not already done.

          The removal of the overall lock is fine since each internal method invocation like getTotalTasks() are already handling their own locking.

          lgtm.

          Moving VM invoked sync calls onto the dispatcher is a good idea but would need the addition of new callbacks into the VM to notify them of completion of the requested vertex state change operation. Since most current VMs dont do much after changing parallelism, the change might be simpler to implement now. Not sure about Hive custom VMs.

          bikassaha Bikas Saha added a comment - looking at the code further, looks like the crucial change is not holding own vertex lock while trying to read src/dest vertex lock. that makes sense and seems like a lock ordering issue waiting to happen. Perhaps a quick scan of such nested locking is in order in case not already done. The removal of the overall lock is fine since each internal method invocation like getTotalTasks() are already handling their own locking. lgtm. Moving VM invoked sync calls onto the dispatcher is a good idea but would need the addition of new callbacks into the VM to notify them of completion of the requested vertex state change operation. Since most current VMs dont do much after changing parallelism, the change might be simpler to implement now. Not sure about Hive custom VMs.

          Thanks a lot sseth, bikassaha. Will commit it shortly.

          jeagles - Attaching patch for branch-0.7 as well. Will commit it to branch-0.7

          rajesh.balamohan Rajesh Balamohan added a comment - Thanks a lot sseth , bikassaha . Will commit it shortly. jeagles - Attaching patch for branch-0.7 as well. Will commit it to branch-0.7
          tezqa TezQA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12809732/TEZ-3297.2.branch-0.7.patch
          against master revision 1d11ad2.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1793//console

          This message is automatically generated.

          tezqa TezQA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12809732/TEZ-3297.2.branch-0.7.patch against master revision 1d11ad2. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1793//console This message is automatically generated.

          Committed to master:
          >>>>
          commit 51fb3e4cf0213a39e52476127f390639247e089b
          >>>>

          branch-0.8:
          >>>>
          commit 087bf0227e332aa14000688fd3ff20230e546563
          >>>>

          branch-0.7:
          >>>>
          commit f478befcc4ac7bae3af0516ce13e3e2144be7afd
          >>>>

          rajesh.balamohan Rajesh Balamohan added a comment - Committed to master: >>>> commit 51fb3e4cf0213a39e52476127f390639247e089b >>>> branch-0.8: >>>> commit 087bf0227e332aa14000688fd3ff20230e546563 >>>> branch-0.7: >>>> commit f478befcc4ac7bae3af0516ce13e3e2144be7afd >>>>
          zhiyuany Zhiyuan Yang added a comment -

          Add 0.8.4 and 0.9.0 to 'Fix Version' since patch was committed to these branch also.

          zhiyuany Zhiyuan Yang added a comment - Add 0.8.4 and 0.9.0 to 'Fix Version' since patch was committed to these branch also.

          People

            rajesh.balamohan Rajesh Balamohan
            zhiyuany Zhiyuan Yang
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: