Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21469

Review of ZooKeeperHiveLockManager

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 3.2.0, 4.0.0
    • None
    • Locking
    • None
    • Patch

    Description

      A lot of sins in this class to resolve:

        @Override
        public void setContext(HiveLockManagerCtx ctx) throws LockException {
       try {
            curatorFramework = CuratorFrameworkSingleton.getInstance(conf);
            parent = conf.getVar(HiveConf.ConfVars.HIVE_ZOOKEEPER_NAMESPACE);
            try{
              curatorFramework.create().withMode(CreateMode.PERSISTENT).forPath("/" +  parent, new byte[0]);
            } catch (Exception e) {
              // ignore if the parent already exists
              if (!(e instanceof KeeperException) || ((KeeperException)e).code() != KeeperException.Code.NODEEXISTS) {
                LOG.warn("Unexpected ZK exception when creating parent node /" + parent, e);
              }
            }
      

      Every time a new session is created and this setContext method is called, it attempts to create the root node. I have seen that, even though the root node exists, an create node action is written to the ZK logs. Check first if the node exists before trying to create it.

            try {
              curatorFramework.delete().forPath(zLock.getPath());
            } catch (InterruptedException ie) {
              curatorFramework.delete().forPath(zLock.getPath());
            }
      

      There has historically been a quite a few bugs regarding leaked locks. The Driver will signal the session Thread by performing an interrupt. That interrupt can happen any time and it can kill a create/delete action within the ZK framework. We can see one example of workaround for this. If the ZK action is interrupted, simply do it again. Well, what if it's interrupted yet again? The lock will be leaked. Also, when the InterruptedException is caught in the try block, the thread's interrupted flag is cleared. The flag is not reset in this code and therefore we lose the fact that this thread has been interrupted. This flag should be preserved so that other code paths will know that it's time to exit.

              if (tryNum > 1) {
                Thread.sleep(sleepTime);
              }
              unlockPrimitive(hiveLock, parent, curatorFramework);
              break;
            } catch (Exception e) {
              if (tryNum >= numRetriesForUnLock) {
                String name = ((ZooKeeperHiveLock)hiveLock).getPath();
                throw new LockException("Node " + name + " can not be deleted after " + numRetriesForUnLock + " attempts.",
                    e);
              }
            }
      

      ... related... the sleep here may be interrupted, but we still need to delete the lock (again, for fear of leaking it). This sleep should be uninterruptible. If we need to get the lock deleted, and there's a problem, interrupting the sleep will cause the code to eventually exit and locks will be leaked.

      It also requires a bunch more TLC.

      Attachments

        1. HIVE-21469.9.patch
          57 kB
          David Mollitor
        2. HIVE-21469.8.patch
          58 kB
          David Mollitor
        3. HIVE-21469.7.patch
          56 kB
          David Mollitor
        4. HIVE-21469.6.patch
          52 kB
          David Mollitor
        5. HIVE-21469.5.patch
          51 kB
          David Mollitor
        6. HIVE-21469.4.patch
          51 kB
          David Mollitor
        7. HIVE-21469.3.patch
          46 kB
          David Mollitor
        8. HIVE-21469.2.patch
          29 kB
          David Mollitor
        9. HIVE-21469.10.patch
          58 kB
          David Mollitor
        10. HIVE-21469.1.patch
          30 kB
          David Mollitor

        Activity

          People

            belugabehr David Mollitor
            belugabehr David Mollitor
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: