Hive
  1. Hive
  2. HIVE-5575

ZooKeeper connection closed when unlock with retry

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      See the attachment, I have encountered a scenario that hive retries to unlock all locks, but zookeeper session is closed. If there are hundreds of locks, say dynamic partition, the process will hang up for several days.

      The stack is

      Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode):
      
      "Attach Listener" daemon prio=10 tid=0x000000000683f000 nid=0x34d0 waiting on condition [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "LeaseChecker" daemon prio=10 tid=0x0000000006693800 nid=0x2713 waiting on condition [0x0000000042af7000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      	at java.lang.Thread.sleep(Native Method)
      	at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
      	at java.lang.Thread.run(Thread.java:722)
      
         Locked ownable synchronizers:
      	- None
      
      "Service Thread" daemon prio=10 tid=0x00002aaab8001000 nid=0x2651 runnable [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "C2 CompilerThread1" daemon prio=10 tid=0x0000000005c7c800 nid=0x2650 waiting on condition [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "C2 CompilerThread0" daemon prio=10 tid=0x0000000005c71000 nid=0x264f waiting on condition [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "Signal Dispatcher" daemon prio=10 tid=0x0000000005c6f000 nid=0x264e runnable [0x0000000000000000]
         java.lang.Thread.State: RUNNABLE
      
         Locked ownable synchronizers:
      	- None
      
      "Finalizer" daemon prio=10 tid=0x0000000005c22000 nid=0x264d in Object.wait() [0x00000000427f4000]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
      	- locked <0x000000078324b110> (a java.lang.ref.ReferenceQueue$Lock)
      	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
      	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
      
         Locked ownable synchronizers:
      	- None
      
      "Reference Handler" daemon prio=10 tid=0x0000000005c1a000 nid=0x264c in Object.wait() [0x0000000041900000]
         java.lang.Thread.State: WAITING (on object monitor)
      	at java.lang.Object.wait(Native Method)
      	at java.lang.Object.wait(Object.java:503)
      	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
      	- locked <0x000000078328fbc0> (a java.lang.ref.Reference$Lock)
      
         Locked ownable synchronizers:
      	- None
      
      "main" prio=10 tid=0x0000000005b76800 nid=0x263d waiting on condition [0x0000000040f46000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      	at java.lang.Thread.sleep(Native Method)
      	at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlockWithRetry(ZooKeeperHiveLockManager.java:426)
      	at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlock(ZooKeeperHiveLockManager.java:415)
      	at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.releaseLocks(ZooKeeperHiveLockManager.java:257)
      	at org.apache.hadoop.hive.ql.Driver.releaseLocks(Driver.java:864)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:953)
      	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
      	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
      	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
      	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
      	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
      	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712)
      	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:601)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
      
         Locked ownable synchronizers:
      	- None
      
      "VM Thread" prio=10 tid=0x0000000005c12800 nid=0x264b runnable 
      
      "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000005b84800 nid=0x263e runnable 
      
      "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000005b86000 nid=0x263f runnable 
      
      "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000005b88000 nid=0x2640 runnable 
      
      "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000005b8a000 nid=0x2641 runnable 
      
      "GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000005b8b800 nid=0x2642 runnable 
      
      "GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000005b8d800 nid=0x2643 runnable 
      
      "GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000005b8f800 nid=0x2644 runnable 
      
      "GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000005b91000 nid=0x2645 runnable 
      
      "GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000005b93000 nid=0x2646 runnable 
      
      "GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000005b95000 nid=0x2647 runnable 
      
      "GC task thread#10 (ParallelGC)" prio=10 tid=0x0000000005b96800 nid=0x2648 runnable 
      
      "GC task thread#11 (ParallelGC)" prio=10 tid=0x0000000005b98800 nid=0x2649 runnable 
      
      "GC task thread#12 (ParallelGC)" prio=10 tid=0x0000000005b9a800 nid=0x264a runnable 
      
      "VM Periodic Task Thread" prio=10 tid=0x00002aaab800c000 nid=0x2652 waiting on condition 
      
      JNI global references: 294
      
      1. D13515.1.patch
        1 kB
        Phabricator
      2. HIVE-5575.patch
        1 kB
        Chun Chen
      3. zookeeper session closed.png
        327 kB
        Chun Chen

        Activity

        Chun Chen created issue -
        Phabricator made changes -
        Field Original Value New Value
        Attachment D13515.1.patch [ 12608911 ]
        Chun Chen made changes -
        Attachment zookeeper session closed.png [ 12608913 ]
        Chun Chen made changes -
        Description See the attachment, I have encountered a scenario that hive retries to unlock all locks, but zookeeper session is closed. If there are hundreds of locks, say dynamic partition, the process will hang up for several days. See the attachment, I have encountered a scenario that hive retries to unlock all locks, but zookeeper session is closed. If there are hundreds of locks, say dynamic partition, the process will hang up for several days.

        The stack is
        {code}
        Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode):

        "Attach Listener" daemon prio=10 tid=0x000000000683f000 nid=0x34d0 waiting on condition [0x0000000000000000]
           java.lang.Thread.State: RUNNABLE

           Locked ownable synchronizers:
        - None

        "LeaseChecker" daemon prio=10 tid=0x0000000006693800 nid=0x2713 waiting on condition [0x0000000042af7000]
           java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
        at java.lang.Thread.run(Thread.java:722)

           Locked ownable synchronizers:
        - None

        "Service Thread" daemon prio=10 tid=0x00002aaab8001000 nid=0x2651 runnable [0x0000000000000000]
           java.lang.Thread.State: RUNNABLE

           Locked ownable synchronizers:
        - None

        "C2 CompilerThread1" daemon prio=10 tid=0x0000000005c7c800 nid=0x2650 waiting on condition [0x0000000000000000]
           java.lang.Thread.State: RUNNABLE

           Locked ownable synchronizers:
        - None

        "C2 CompilerThread0" daemon prio=10 tid=0x0000000005c71000 nid=0x264f waiting on condition [0x0000000000000000]
           java.lang.Thread.State: RUNNABLE

           Locked ownable synchronizers:
        - None

        "Signal Dispatcher" daemon prio=10 tid=0x0000000005c6f000 nid=0x264e runnable [0x0000000000000000]
           java.lang.Thread.State: RUNNABLE

           Locked ownable synchronizers:
        - None

        "Finalizer" daemon prio=10 tid=0x0000000005c22000 nid=0x264d in Object.wait() [0x00000000427f4000]
           java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        - locked <0x000000078324b110> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

           Locked ownable synchronizers:
        - None

        "Reference Handler" daemon prio=10 tid=0x0000000005c1a000 nid=0x264c in Object.wait() [0x0000000041900000]
           java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x000000078328fbc0> (a java.lang.ref.Reference$Lock)

           Locked ownable synchronizers:
        - None

        "main" prio=10 tid=0x0000000005b76800 nid=0x263d waiting on condition [0x0000000040f46000]
           java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlockWithRetry(ZooKeeperHiveLockManager.java:426)
        at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlock(ZooKeeperHiveLockManager.java:415)
        at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.releaseLocks(ZooKeeperHiveLockManager.java:257)
        at org.apache.hadoop.hive.ql.Driver.releaseLocks(Driver.java:864)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:953)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
        at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

           Locked ownable synchronizers:
        - None

        "VM Thread" prio=10 tid=0x0000000005c12800 nid=0x264b runnable

        "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000005b84800 nid=0x263e runnable

        "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000005b86000 nid=0x263f runnable

        "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000005b88000 nid=0x2640 runnable

        "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000005b8a000 nid=0x2641 runnable

        "GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000005b8b800 nid=0x2642 runnable

        "GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000005b8d800 nid=0x2643 runnable

        "GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000005b8f800 nid=0x2644 runnable

        "GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000005b91000 nid=0x2645 runnable

        "GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000005b93000 nid=0x2646 runnable

        "GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000005b95000 nid=0x2647 runnable

        "GC task thread#10 (ParallelGC)" prio=10 tid=0x0000000005b96800 nid=0x2648 runnable

        "GC task thread#11 (ParallelGC)" prio=10 tid=0x0000000005b98800 nid=0x2649 runnable

        "GC task thread#12 (ParallelGC)" prio=10 tid=0x0000000005b9a800 nid=0x264a runnable

        "VM Periodic Task Thread" prio=10 tid=0x00002aaab800c000 nid=0x2652 waiting on condition

        JNI global references: 294
        {code}
        Chun Chen made changes -
        Attachment HIVE-5575.patch [ 12608914 ]
        Chun Chen made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Brock Noland made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Chun Chen
            Reporter:
            Chun Chen
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development