Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0-alpha-3, 2.4.12
-
None
-
2.4.6 with backported patches (can list if desired). Running with temurinjdk11.0.12+7
Description
Recently I started testing out BucketCache on some of our new hbase2 clusters. When BucketCache is enabled, it causes all disk reads to use ByteBuffAllocator in an attempt to avoid heap allocations. Without BucketCache enabled, even with ByteBuffAllocator enabled it will not be used for disk reads.
At first this was amazing, we had close to 0% heap allocations which drastically reduces CPU and GC time. Over time I noticed that the ByteBuffAllocator pool filled up, and at that point all allocations come from the heap and our heap allocation % goes to 100%.
We were using default max buffer count, which was 4096 for a smaller host and 7680 for a larger one. At first I figured we just needed more buffers, so I upped it to 120k. It took longer, but still eventually exhausts the pool.
This does not cause an OOM, because I made sure to allocate enough direct memory for max buffer count * buffer size (65k). It just causes the usedBufCount to exceed maxBufCount, resulting in 100% heap allocations going forward. It never recovers from this state until the server is restarted.
Some early observations:
- Running a major compaction causes a drastic up-tick in used buffers. Major compacting a 1.5GB region could easily expand the usedBufCount by ~10,000. Most of those are never recovered. I could keep compacting the same region over and over, each time increasing the usedBufCount by 5000-15000, until max is reached.
- Despite usedBufCount increasing, direct memory usage largely does not increase. This indicates to me that the DirectByteBuffers are being reclaimed by GC, but the Recycler is not being called.
- This was confirmed with a heap dump, which showed really no obvious leak in the typical sense. There were very very few DirectByteBuffers with capacity 66560.
- Enabling BucketCache triggers use of ByteBuffAllocator, but I don't think it's related to the actual problem. First of all, compactions skip the cache so that would be odd. Secondly, I disabled hbase.block.data.cacheonread and all other CacheConfig, so the BucketCache is going unused (0 bytes used) and the problem still persists.
Digging deeper, I instrumented ByteBuffAllocator in two ways:
- I added some trace loggings in getBuffer() and putbackBuffer() so I could see the number of allocations, returns, pool size at the time, and a stacktrace.
- I added netty's ResourceLeakDetector to the RefCnt class we use in ByteBuffAllocator. Calling track() on creation of a RefCnt, record() in RefCnt.retain(), and close() in RefCnt.deallocate().
The ResourceLeakDetector immediately picked up leaks on start of the regionserver. I saw leaks coming from both user requests and, as expected, the Compactor.
I collected 21 unique LEAK stacktraces, which is too many to list. But all of them had 2 common roots, so below I post a full stacktrace for 2 examples of those roots:
Created at: org.apache.hadoop.hbase.nio.RefCnt.<init>(RefCnt.java:58) org.apache.hadoop.hbase.nio.RefCnt.create(RefCnt.java:53) org.apache.hadoop.hbase.nio.ByteBuff.wrap(ByteBuff.java:597) org.apache.hadoop.hbase.io.ByteBuffAllocator.allocate(ByteBuffAllocator.java:350) org.apache.hadoop.hbase.io.hfile.HFileBlock.allocateBuffer(HFileBlock.java:671) org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:640) org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1340) org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:331) org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:679) org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:631) org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315) org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216) org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:417) org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:257) org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2152) org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2142) org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:7211) org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:7191) org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:3160) org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:3140) org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:3122) org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:3116) org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2657) org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:807) org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2898) org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45961) org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392) org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134) org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:358) org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
and
Created at: org.apache.hadoop.hbase.nio.RefCnt.<init>(RefCnt.java:58) org.apache.hadoop.hbase.nio.RefCnt.create(RefCnt.java:53) org.apache.hadoop.hbase.nio.ByteBuff.wrap(ByteBuff.java:597) org.apache.hadoop.hbase.io.ByteBuffAllocator.allocate(ByteBuffAllocator.java:350) org.apache.hadoop.hbase.io.hfile.HFileBlock.allocateBuffer(HFileBlock.java:671) org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:640) org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1340) org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readAndUpdateNewBlock(HFileReaderImpl.java:968) org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:957) org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:322) org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216) org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:417) org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:324) org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:290) org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:513) org.apache.hadoop.hbase.regionserver.compactions.Compactor$1.createScanner(Compactor.java:254) org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:333) org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65) org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126) org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1510) org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2390) org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:617) org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:659) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) java.base/java.lang.Thread.run(Thread.java:829)
My trace logging gave some insight as well:
- It showed very similar stacktraces to above for the allocations
- It also confirmed that we are indeed returning many buffers to the pool, coming from calls to Shipper.shipped() in Compactor and ServerCall.setResponse for user requests.
- However, the count allocations vs returns further validated the leak. We might have 150k allocations but only 140k returns (as an example).
- Similarly, the pool size (buffers.size()) was always less than 1000, sometimes close to 0. Once the usedBufCount > maxBufCount, the pool size shrinks to 0 and stays at 0 forever.
Things I've tried:
- I've been pouring over the stacktraces, trying to see how we might leak. Certainly we do some extra allocations when unpacking compressed blocks, but it seems like we always release one. I also tried picking through from bottom to top, trying to trace that we're closing everything and it seems like we are. This code is complicated though so I easily could have missed something.
- I tried setting COMPRESSION => NONE and compacting a region in that table, it still caused a massive expansion/leak of buffers.
- We don't fully use cell block encoding, and I wondered if there were some assumptions about that in regards to when the blocks are released in ServerCall. I enabled cell block encoding for all clients of a cluster and the leak still occurred. It's worth noting our client is version 1.2, which doesn't support cell block encoding for plain Gets. So we aren't fully enabled, but I read through the ServerCall code and it seems like we don't make assumptions.
I've tried digging into the stacktraces above and haven't yet found an obvious leak. I tried turning setting COMPRESSION => NONE, but compacting a region in that table still causes a massive leak.
Attachments
Issue Links
- blocks
-
HBASE-27097 SimpleRpcServer is broken
- Resolved
- is duplicated by
-
HBASE-27173 Consider adding ResourceLeakDetector to RefCnt
- Resolved
- relates to
-
HBASE-26708 Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation
- Resolved
- links to