Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-7233

CQSI openConnection should timeout to unblock other connection threads

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 5.1.3
    • None
    • None
    • None

    Description

      PhoenixDriver initializes and caches ConnectionQueryServices objects with connectionQueryServicesCache. As part of the CQSI initialization, connection is opened with HBase server by using HBase client provided ConnectionFactory, which provides Connection object to the client. The Connection object provided by HBase allows clients to share Zookeeper connection, meta cache as well as remote connections to regionservers and master daemons. The Connection object is used to perform Table CRUD operations as well as Administrative actions on the cluster.

      HBase Connection object initialization requires ClusterId, which is maintained either in Zookeeper or Master daemons (or both) and retrieved by client depending on whether the client is configured to use ZKConnectionRegistry or MasterRegistry/RpcConnectionRegistry.

      For ZKConnectionRegistry, we have run into an edge case wherein the connection to Zookeeper server got stuck for more than 12 hours. When the client tried to create connection to Zookeeper quorum to retrieve the ClusterId, Zookeeper leader was switched from one server to another. While the leader switch event resulting into stuck connection requires RCA, it is not appropriate for Phoenix/HBase client to indefinitely wait for the response from Zookeeper without any connection timeout.

      For Phoenix client, if one thread is stuck in opening connection during CQSI#init, all other threads trying to create connections would get stuck because we take class level lock before opening the connection, leading to all threads getting stuck and potential termination or degradation of the client JVM.

      While HBase client should also use timeout, however not having timeout from Phoenix client side has far worse complications. As part of this Jira, we should introduce a way for CQSI#openConnection to timeout, either by using CompletableFuture API or using our preconfigured thread-pool.

       

      Stacktrace for reference:

       

      jdk.internal.misc.Unsafe.park
      java.util.concurrent.locks.LockSupport.park
      java.util.concurrent.CompletableFuture$Signaller.block
      java.util.concurrent.ForkJoinPool.managedBlock
      java.util.concurrent.CompletableFuture.waitingGet
      java.util.concurrent.CompletableFuture.get
      org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId
      org.apache.hadoop.hbase.client.ConnectionImplementation.<init>
      jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance?
      jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance
      jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance
      java.lang.reflect.Constructor.newInstance
      org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$?
      org.apache.hadoop.hbase.client.ConnectionFactory$$Lambda$?.run
      java.security.AccessController.doPrivileged
      javax.security.auth.Subject.doAs
      org.apache.hadoop.security.UserGroupInformation.doAs
      org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs
      org.apache.hadoop.hbase.client.ConnectionFactory.createConnection
      org.apache.hadoop.hbase.client.ConnectionFactory.createConnection
      org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection
      org.apache.phoenix.query.ConnectionQueryServicesImpl.access$?
      org.apache.phoenix.query.ConnectionQueryServicesImpl$?.call
      org.apache.phoenix.query.ConnectionQueryServicesImpl$?.call
      org.apache.phoenix.util.PhoenixContextExecutor.call
      org.apache.phoenix.query.ConnectionQueryServicesImpl.init
      org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices
      org.apache.phoenix.jdbc.HighAvailabilityGroup.connectToOneCluster
      org.apache.phoenix.jdbc.ParallelPhoenixConnection.getConnection
      org.apache.phoenix.jdbc.ParallelPhoenixConnection.lambda$new$?
      org.apache.phoenix.jdbc.ParallelPhoenixConnection$$Lambda$?.get
      org.apache.phoenix.jdbc.ParallelPhoenixContext.lambda$chainOnConnClusterContext$?
      org.apache.phoenix.jdbc.ParallelPhoenixContext$$Lambda$?.apply 

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vjasani Viraj Jasani
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: