Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5575

ZooKeeper connection closed when unlock with retry

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.11.0
    • 0.13.0
    • None
    • None

    Description

      See the attachment, I have encountered a scenario that hive retries to unlock all locks, but zookeeper session is closed. If there are hundreds of locks, say dynamic partition, the process will hang up for several days.

      The stack is

      Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode):
      
      "Attach Listener" daemon prio=10 tid=0x000000000683f000 nid=0x34d0 waiting on condition [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "LeaseChecker" daemon prio=10 tid=0x0000000006693800 nid=0x2713 waiting on condition [0x0000000042af7000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      	at java.lang.Thread.sleep(Native Method)
      	at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
      	at java.lang.Thread.run(Thread.java:722)
      
         Locked ownable synchronizers:
      	- None
      
      "Service Thread" daemon prio=10 tid=0x00002aaab8001000 nid=0x2651 runnable [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "C2 CompilerThread1" daemon prio=10 tid=0x0000000005c7c800 nid=0x2650 waiting on condition [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "C2 CompilerThread0" daemon prio=10 tid=0x0000000005c71000 nid=0x264f waiting on condition [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "Signal Dispatcher" daemon prio=10 tid=0x0000000005c6f000 nid=0x264e runnable [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "Finalizer" daemon prio=10 tid=0x0000000005c22000 nid=0x264d in Object.wait() [0x00000000427f4000]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
      	- locked <0x000000078324b110> (a java.lang.ref.ReferenceQueue$Lock)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
      	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
      
         Locked ownable synchronizers:
      	- None
      
      "Reference Handler" daemon prio=10 tid=0x0000000005c1a000 nid=0x264c in Object.wait() [0x0000000041900000]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at java.lang.Object.wait(Object.java:503)
      	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
      	- locked <0x000000078328fbc0> (a java.lang.ref.Reference$Lock)
      
         Locked ownable synchronizers:
      	- None
      
      "main" prio=10 tid=0x0000000005b76800 nid=0x263d waiting on condition [0x0000000040f46000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      	at java.lang.Thread.sleep(Native Method)
      	at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlockWithRetry(ZooKeeperHiveLockManager.java:426)
      	at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlock(ZooKeeperHiveLockManager.java:415)
      	at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.releaseLocks(ZooKeeperHiveLockManager.java:257)
      	at org.apache.hadoop.hive.ql.Driver.releaseLocks(Driver.java:864)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:953)
      	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
      	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
      	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
      	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
      	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712)
      	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:601)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
      
         Locked ownable synchronizers:
      	- None
      
      "VM Thread" prio=10 tid=0x0000000005c12800 nid=0x264b runnable 
      
      "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000005b84800 nid=0x263e runnable 
      
      "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000005b86000 nid=0x263f runnable 
      
      "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000005b88000 nid=0x2640 runnable 
      
      "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000005b8a000 nid=0x2641 runnable 
      
      "GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000005b8b800 nid=0x2642 runnable 
      
      "GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000005b8d800 nid=0x2643 runnable 
      
      "GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000005b8f800 nid=0x2644 runnable 
      
      "GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000005b91000 nid=0x2645 runnable 
      
      "GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000005b93000 nid=0x2646 runnable 
      
      "GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000005b95000 nid=0x2647 runnable 
      
      "GC task thread#10 (ParallelGC)" prio=10 tid=0x0000000005b96800 nid=0x2648 runnable 
      
      "GC task thread#11 (ParallelGC)" prio=10 tid=0x0000000005b98800 nid=0x2649 runnable 
      
      "GC task thread#12 (ParallelGC)" prio=10 tid=0x0000000005b9a800 nid=0x264a runnable 
      
      "VM Periodic Task Thread" prio=10 tid=0x00002aaab800c000 nid=0x2652 waiting on condition 
      
      JNI global references: 294
      

      Attachments

        1. D13515.1.patch
          1 kB
          Phabricator
        2. HIVE-5575.patch
          1 kB
          Chun Chen
        3. zookeeper session closed.png
          327 kB
          Chun Chen

        Activity

          People

            chenchun Chun Chen
            chenchun Chun Chen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: