Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26088

conn.getBufferedMutator(tableName) leaks thread executors and other problems

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.0.0
    • 2.5.0, 2.3.6, 2.4.5
    • Client
    • None
    • Reviewed
    • Hide
      The API doc for Connection#getBufferedMutator(TableName) and Connection#getBufferedMutator(BufferedMutatorParams) mentioned that when user dont pass a ThreadPool to be used, we use the ThreadPool in the Connection. But in reality, we were creating new ThreadPool in such cases.

      We are keeping the behaviour of code as is but corrected the Javadoc and also a bug of not closing this new pool while Closing the BufferedMutator.
      Show
      The API doc for Connection#getBufferedMutator(TableName) and Connection#getBufferedMutator(BufferedMutatorParams) mentioned that when user dont pass a ThreadPool to be used, we use the ThreadPool in the Connection. But in reality, we were creating new ThreadPool in such cases. We are keeping the behaviour of code as is but corrected the Javadoc and also a bug of not closing this new pool while Closing the BufferedMutator.

    Description

      TL;DR: conn.getBufferedMutator(tableName) is dangerous in hbase client 2.4.4 and doesn't match documented behavior in 1.4.13.

      To work around the problems until fixed do this:

      var mySingletonPool = HTable.getDefaultExecutor(hbaseConf);
      var params = new BufferedMutatorParams(tableName);
      params.pool(mySingletonPool);
      var myMutator = conn.getBufferedMutator(params);
      

      And avoid code like this:

      var myMutator = conn.getBufferedMutator(tableName);
      

      The full story:

      My application started leaking threads after upgrading from hbase client 1.4.13 to 2.4.4. So much so that after less than a minute of runtime more that 30k threads are leaked and all available virtual memory on the box (> 50 GB) is consumed. Other processes on the box start crashing with memory allocation errors. Even running ls at the shell fails with OS resource allocation failures.

      A thread dump after just a few seconds of runtime shows thousands of threads like this:

      "htable-pool-0" #8841 prio=5 os_prio=0 cpu=0.15ms elapsed=7.49s tid=0x00007efb6d2a1000 nid=0x57d2 waiting on condition [0x00007ef8a6c38000]
       java.lang.Thread.State: TIMED_WAITING (parking)
       at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
       - parking to wait for <0x00000007e7cd6188> (a java.util.concurrent.SynchronousQueue$TransferStack)
       at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6/LockSupport.java:234)
       at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.base@11.0.6/SynchronousQueue.java:462)
       at java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@11.0.6/SynchronousQueue.java:361)
       at java.util.concurrent.SynchronousQueue.poll(java.base@11.0.6/SynchronousQueue.java:937)
       at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@11.0.6/ThreadPoolExecutor.java:1053)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1114)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
       at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
      

       

      Note: All the threads are labeled htable-pool-0. That suggests we're leaking thread executors not just threads. The htable-pool part indicates the problem is to do with HTable.getDefaultExecutor(conf) and the only part of my code that interacts with that is a call to conn.getBufferedMutator(tableName).

       

      Looking at the hbase client code shows a few problems:

      1) Neither 1.4.13 nor 2.4.4's behavior matches the documentation for conn.getBufferedMutator(tableName) which says:

      This BufferedMutator will use the Connection's ExecutorService.

      That suggests some singleton thread executor is being used which is not the case.

       

      2) Under 1.4.13 you get a new ThreadPoolExecutor for every BufferedMutator. That's probably not what you want but you likely won't notice. I didn't. It's a code path I hadn't profiled much.

       

      3) Under 2.4.4 you get a new ThreadPoolExecutor for every BufferedMutator and that ThreadPoolExecutor is not cleaned up after the Mutator is closed. Each completed ThreadPoolExecutor carries with it one thread which hangs around until a timeout value which defaults to 60 seconds.

      My application creates one BufferedMutator for every incoming stream and there are lots of streams, some of them are short lived so my code leaks threads fast under 2.4.4.

      Here's the part where a new executor is created for every BufferedMutator (it's similar for 1.4.13):

      https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L420

       

      The reason for the leak in 2.4.4 is the should-we/shouldn't-we cleanup logic added here:

      https://github.com/apache/hbase/blob/branch-2.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/BufferedMutatorImpl.java#L104

      That might be ok if pool was being initialized there but in the conn.getBufferedMutator(tableName) code path it's not. pool is initialized in conn.getBufferedMutator itself so the executor cleanup code never runs.

      Attachments

        Issue Links

          Activity

            People

              shahrs87 Rushabh Shah
              whitney13 Whitney Jackson
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: