Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28339

HBaseReplicationEndpoint creates new ZooKeeper client every time it tries to reconnect

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.6.0, 2.4.17, 3.0.0-beta-1, 2.5.7, 2.7.0
    • None
    • Replication
    • None

    Description

      Asbtract base class HBaseReplicationEndpoint and therefore HBaseInterClusterReplicationEndpoint creates new ZooKeeper client instance every time there's an error occurs in communication and it tries to reconnect. This was not a problem with ZooKeeper 3.4.x versions, because the TGT Login thread was a static reference and only created once for all clients in the same JVM. With the upgrade to ZooKeeper 3.5.x the login thread is dedicated to the client instance, hence we have a new login thread every time the replication endpoint reconnects.

      /**
       * A private method used to re-establish a zookeeper session with a peer cluster.
       */
      protected void reconnect(KeeperException ke) {
        if (
          ke instanceof ConnectionLossException || ke instanceof SessionExpiredException
            || ke instanceof AuthFailedException
        ) {
          String clusterKey = ctx.getPeerConfig().getClusterKey();
          LOG.warn("Lost the ZooKeeper connection for peer " + clusterKey, ke);
          try {
            reloadZkWatcher();
          } catch (IOException io) {
            LOG.warn("Creation of ZookeeperWatcher failed for peer " + clusterKey, io);
          }
        }
      }
      /**
       * Closes the current ZKW (if not null) and creates a new one
       * @throws IOException If anything goes wrong connecting
       */
      synchronized void reloadZkWatcher() throws IOException {
        if (zkw != null) zkw.close();
        zkw = new ZKWatcher(ctx.getConfiguration(), "connection to cluster: " + ctx.getPeerId(), this);
        getZkw().registerListener(new PeerRegionServerListener(this));
      } 

      If the target cluster of replication is unavailable for some reason, the replication endpoint keeps trying to reconnect to ZooKeeper destroying and creating new Login threads constantly which will carpet bomb the KDC host with login requests.
       
      I'm not sure how to fix this yet, trying to create a unit test first.

      Attachments

        Issue Links

          Activity

            People

              andor Andor Molnar
              andor Andor Molnar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: