Hive
  1. Hive
  2. HIVE-3913

Possible deadlock in ZK lock manager

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: None
    • Labels:
      None

      Description

      ZK Hive lock manager can get into a state when the connection is closed, but no reconnection is attempted.

      1. D8097.1.patch
        2 kB
        Phabricator

        Activity

        Hide
        Ashutosh Chauhan added a comment -

        Mikhail,
        A recent fix went into trunk in this area via HIVE-3537, so you may want to test your scenario on latest trunk.

        Show
        Ashutosh Chauhan added a comment - Mikhail, A recent fix went into trunk in this area via HIVE-3537 , so you may want to test your scenario on latest trunk.
        Hide
        Phabricator added a comment -

        mbautin requested code review of "[jira] HIVE-3913 Fix possible deadlock in ZK lock manager".
        Reviewers: ashutoshc, njain, kevinwilfong, JIRA

        ZK Hive lock manager can get into a state when the connection is closed, but no reconnection is attempted.

        TEST PLAN
        Run unit tests

        REVISION DETAIL
        https://reviews.facebook.net/D8097

        AFFECTED FILES
        ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java

        MANAGE HERALD DIFFERENTIAL RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/19533/

        To: ashutoshc, njain, kevinwilfong, JIRA, mbautin

        Show
        Phabricator added a comment - mbautin requested code review of " [jira] HIVE-3913 Fix possible deadlock in ZK lock manager". Reviewers: ashutoshc, njain, kevinwilfong, JIRA ZK Hive lock manager can get into a state when the connection is closed, but no reconnection is attempted. TEST PLAN Run unit tests REVISION DETAIL https://reviews.facebook.net/D8097 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/19533/ To: ashutoshc, njain, kevinwilfong, JIRA, mbautin
        Hide
        Mikhail Bautin added a comment -

        Ashutosh Chauhan: thanks for pointing this out. I took a look at HIVE-3537, and I think this is a much simpler fix for a situation that I observed when the client attempts to acquire a lock, but the ZK connection is closed for some reason and no reconnection is attempted.

        Show
        Mikhail Bautin added a comment - Ashutosh Chauhan : thanks for pointing this out. I took a look at HIVE-3537 , and I think this is a much simpler fix for a situation that I observed when the client attempts to acquire a lock, but the ZK connection is closed for some reason and no reconnection is attempted.
        Hide
        Phabricator added a comment -

        mbautin has commented on the revision "[jira] HIVE-3913 Fix possible deadlock in ZK lock manager".

        This patch also logs some exceptions that were previously ignored for easier debugging.

        REVISION DETAIL
        https://reviews.facebook.net/D8097

        To: ashutoshc, njain, kevinwilfong, JIRA, mbautin

        Show
        Phabricator added a comment - mbautin has commented on the revision " [jira] HIVE-3913 Fix possible deadlock in ZK lock manager". This patch also logs some exceptions that were previously ignored for easier debugging. REVISION DETAIL https://reviews.facebook.net/D8097 To: ashutoshc, njain, kevinwilfong, JIRA, mbautin
        Hide
        Ashutosh Chauhan added a comment -

        Mikhail,
        Change looks alright. But can you explain how you got in this situation, where connection was closed after it was already opened. Any easy way to reproduce the scenario?

        Show
        Ashutosh Chauhan added a comment - Mikhail, Change looks alright. But can you explain how you got in this situation, where connection was closed after it was already opened. Any easy way to reproduce the scenario?
        Hide
        Phabricator added a comment -

        ashutoshc has accepted the revision "[jira] HIVE-3913 Fix possible deadlock in ZK lock manager".

        +1

        REVISION DETAIL
        https://reviews.facebook.net/D8097

        BRANCH
        trunk-hive-3913-deadlock-in-zk-lock-manager

        To: ashutoshc, njain, kevinwilfong, JIRA, mbautin

        Show
        Phabricator added a comment - ashutoshc has accepted the revision " [jira] HIVE-3913 Fix possible deadlock in ZK lock manager". +1 REVISION DETAIL https://reviews.facebook.net/D8097 BRANCH trunk-hive-3913-deadlock-in-zk-lock-manager To: ashutoshc, njain, kevinwilfong, JIRA, mbautin
        Hide
        Ashutosh Chauhan added a comment -

        Committed to trunk. Thanks, Mikhail!

        Show
        Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Mikhail!
        Hide
        Hudson added a comment -

        Integrated in hive-trunk-hadoop1 #41 (See https://builds.apache.org/job/hive-trunk-hadoop1/41/)
        HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin via Ashutosh Chauhan) (Revision 1438116)

        Result = ABORTED
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438116
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Show
        Hudson added a comment - Integrated in hive-trunk-hadoop1 #41 (See https://builds.apache.org/job/hive-trunk-hadoop1/41/ ) HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin via Ashutosh Chauhan) (Revision 1438116) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438116 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1935 (See https://builds.apache.org/job/Hive-trunk-h0.21/1935/)
        HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin via Ashutosh Chauhan) (Revision 1438116)

        Result = SUCCESS
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438116
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1935 (See https://builds.apache.org/job/Hive-trunk-h0.21/1935/ ) HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin via Ashutosh Chauhan) (Revision 1438116) Result = SUCCESS hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438116 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #86 (See https://builds.apache.org/job/Hive-trunk-hadoop2/86/)
        HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin via Ashutosh Chauhan) (Revision 1438116)

        Result = FAILURE
        hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438116
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #86 (See https://builds.apache.org/job/Hive-trunk-hadoop2/86/ ) HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin via Ashutosh Chauhan) (Revision 1438116) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438116 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java

          People

          • Assignee:
            Mikhail Bautin
            Reporter:
            Mikhail Bautin
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development