Hive
  1. Hive
  2. HIVE-5575

ZooKeeper connection closed when unlock with retry

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      See the attachment, I have encountered a scenario that hive retries to unlock all locks, but zookeeper session is closed. If there are hundreds of locks, say dynamic partition, the process will hang up for several days.

      The stack is

      Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode):
      
      "Attach Listener" daemon prio=10 tid=0x000000000683f000 nid=0x34d0 waiting on condition [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "LeaseChecker" daemon prio=10 tid=0x0000000006693800 nid=0x2713 waiting on condition [0x0000000042af7000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      	at java.lang.Thread.sleep(Native Method)
      	at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
      	at java.lang.Thread.run(Thread.java:722)
      
         Locked ownable synchronizers:
      	- None
      
      "Service Thread" daemon prio=10 tid=0x00002aaab8001000 nid=0x2651 runnable [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "C2 CompilerThread1" daemon prio=10 tid=0x0000000005c7c800 nid=0x2650 waiting on condition [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "C2 CompilerThread0" daemon prio=10 tid=0x0000000005c71000 nid=0x264f waiting on condition [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "Signal Dispatcher" daemon prio=10 tid=0x0000000005c6f000 nid=0x264e runnable [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "Finalizer" daemon prio=10 tid=0x0000000005c22000 nid=0x264d in Object.wait() [0x00000000427f4000]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
      	- locked <0x000000078324b110> (a java.lang.ref.ReferenceQueue$Lock)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
      	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
      
         Locked ownable synchronizers:
      	- None
      
      "Reference Handler" daemon prio=10 tid=0x0000000005c1a000 nid=0x264c in Object.wait() [0x0000000041900000]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at java.lang.Object.wait(Object.java:503)
      	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
      	- locked <0x000000078328fbc0> (a java.lang.ref.Reference$Lock)
      
         Locked ownable synchronizers:
      	- None
      
      "main" prio=10 tid=0x0000000005b76800 nid=0x263d waiting on condition [0x0000000040f46000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      	at java.lang.Thread.sleep(Native Method)
      	at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlockWithRetry(ZooKeeperHiveLockManager.java:426)
      	at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlock(ZooKeeperHiveLockManager.java:415)
      	at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.releaseLocks(ZooKeeperHiveLockManager.java:257)
      	at org.apache.hadoop.hive.ql.Driver.releaseLocks(Driver.java:864)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:953)
      	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
      	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
      	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
      	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
      	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712)
      	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:601)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
      
         Locked ownable synchronizers:
      	- None
      
      "VM Thread" prio=10 tid=0x0000000005c12800 nid=0x264b runnable 
      
      "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000005b84800 nid=0x263e runnable 
      
      "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000005b86000 nid=0x263f runnable 
      
      "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000005b88000 nid=0x2640 runnable 
      
      "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000005b8a000 nid=0x2641 runnable 
      
      "GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000005b8b800 nid=0x2642 runnable 
      
      "GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000005b8d800 nid=0x2643 runnable 
      
      "GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000005b8f800 nid=0x2644 runnable 
      
      "GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000005b91000 nid=0x2645 runnable 
      
      "GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000005b93000 nid=0x2646 runnable 
      
      "GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000005b95000 nid=0x2647 runnable 
      
      "GC task thread#10 (ParallelGC)" prio=10 tid=0x0000000005b96800 nid=0x2648 runnable 
      
      "GC task thread#11 (ParallelGC)" prio=10 tid=0x0000000005b98800 nid=0x2649 runnable 
      
      "GC task thread#12 (ParallelGC)" prio=10 tid=0x0000000005b9a800 nid=0x264a runnable 
      
      "VM Periodic Task Thread" prio=10 tid=0x00002aaab800c000 nid=0x2652 waiting on condition 
      
      JNI global references: 294
      
      1. D13515.1.patch
        1 kB
        Phabricator
      2. HIVE-5575.patch
        1 kB
        Chun Chen
      3. zookeeper session closed.png
        327 kB
        Chun Chen

        Activity

        Hide
        Phabricator added a comment -

        chenchun requested code review of "HIVE-5575 [jira] ZooKeeper connection closed when unlock with retry".

        Reviewers: JIRA

        lock

        See the attachment, I have encountered a scenario that hive retries to unlock all locks, but zookeeper session is closed. If there are hundreds of locks, say dynamic partition, the process will hang up for several days.

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D13515

        AFFECTED FILES
        ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java

        MANAGE HERALD RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/40365/

        To: JIRA, chenchun

        Show
        Phabricator added a comment - chenchun requested code review of " HIVE-5575 [jira] ZooKeeper connection closed when unlock with retry". Reviewers: JIRA lock See the attachment, I have encountered a scenario that hive retries to unlock all locks, but zookeeper session is closed. If there are hundreds of locks, say dynamic partition, the process will hang up for several days. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D13515 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/40365/ To: JIRA, chenchun
        Hide
        Hive QA added a comment -

        Overall: +1 all checks pass

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12608914/HIVE-5575.patch

        SUCCESS: +1 4415 tests passed

        Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1159/testReport
        Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1159/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        

        This message is automatically generated.

        Show
        Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12608914/HIVE-5575.patch SUCCESS: +1 4415 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1159/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1159/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated.
        Hide
        Brock Noland added a comment -

        +1

        Show
        Brock Noland added a comment - +1
        Hide
        Brock Noland added a comment -

        Thank you very much for the contribution Chun! I have committed your patch to trunk!

        Show
        Brock Noland added a comment - Thank you very much for the contribution Chun! I have committed your patch to trunk!
        Hide
        Chun Chen added a comment -

        Thanks for reviewing the code Brock Noland.

        Show
        Chun Chen added a comment - Thanks for reviewing the code Brock Noland .
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hive-trunk-hadoop2 #510 (See https://builds.apache.org/job/Hive-trunk-hadoop2/510/)
        HIVE-5575: ZooKeeper connection closed when unlock with retry (Chun Chen via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533511)

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hive-trunk-hadoop2 #510 (See https://builds.apache.org/job/Hive-trunk-hadoop2/510/ ) HIVE-5575 : ZooKeeper connection closed when unlock with retry (Chun Chen via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533511 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hive-trunk-h0.21 #2407 (See https://builds.apache.org/job/Hive-trunk-h0.21/2407/)
        HIVE-5575: ZooKeeper connection closed when unlock with retry (Chun Chen via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533511)

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hive-trunk-h0.21 #2407 (See https://builds.apache.org/job/Hive-trunk-h0.21/2407/ ) HIVE-5575 : ZooKeeper connection closed when unlock with retry (Chun Chen via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533511 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hive-trunk-hadoop2-ptest #145 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/145/)
        HIVE-5575: ZooKeeper connection closed when unlock with retry (Chun Chen via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533511)

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hive-trunk-hadoop2-ptest #145 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/145/ ) HIVE-5575 : ZooKeeper connection closed when unlock with retry (Chun Chen via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533511 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #208 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/208/)
        HIVE-5575: ZooKeeper connection closed when unlock with retry (Chun Chen via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533511)

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #208 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/208/ ) HIVE-5575 : ZooKeeper connection closed when unlock with retry (Chun Chen via Brock Noland) (brock: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1533511 ) /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java

          People

          • Assignee:
            Chun Chen
            Reporter:
            Chun Chen
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development