DBB got from BucketCache would be freed unexpectedly before RPC completed for following two cases:
- Case 1: Replacing Block and getting Block execute concurrently.
Because of splitting or stream read，two or more read threads may try to cache the same Block concurrently and one thread read all of the next block header could replace the another one didn't(see
HBASE-20447). When replacing Block，the following BucketCache.WriterThread.putIntoBackingMap method invokes BucketCache.blockEvicted in line 916,which free the BucketEntry directly in line 544, not through the RefCount.
Freeing the BucketEntry directly may cause HFileBlock returned from BucketCache.getBlock be freed unexpectedly before RPC completed.
HBASE-26155fixed this problem to some extent, but seems not completely. The BucketCache.getBlock might be invoked just before BucketCache.cacheBlockWithWaitInternal and after the BucketEntry.isRpcRef checking is done. Considering the following sequence:
1. Block1 was cached successfully,the RefCnt of Block1 is 1.
2. Thread1 caching the same BlockCacheKey with Block2 satisfied BlockCacheUtil.shouldReplaceExistingCacheBlock, so Block2 would replace Block1, but thread1 stopping before BucketCache.cacheBlockWithWaitInternal
3. Thread2 invoking BucketCache.getBlock with the same BlockCacheKey, which returned Block1, the RefCnt of Block1 is 2.
4. Thread1 continues caching Block2, in BucketCache.WriterThread.putIntoBackingMap , the old Block1 is freed directly which RefCnt is 2, but the Block1 is still used by Thread2 and the content of Block1 would be overwitten after it is freed, which may cause a serious error.
- Case 2: Evicting Block , caching Block and getting Block execute concurrently.
Considering the following sequence:
1. Thread1 caching Block1, stopping after WriterThread.putIntoBackingMap, the RefCnt of Block1 is 1.
2. Thread2 invoking BucketCache.evictBlock with the same BlockCacheKey , but stopping after BucketCache.removeFromRamCache.
3. Thread3 invoking BucketCache.getBlock with the same BlockCacheKey, which returned Block1, the RefCnt of Block1 is 2.
4. Thread1 continues caching block1,but finding that BucketCache.RAMCache.remove returning false, so invoking BucketCache.blockEvicted to free the the Block1 directly
which RefCnt is 2 and the Block1 is still used by Thread3.
Case 2 is caused by following line 1014 in WriteThread.doDrain, which also free the BucketEntry directly,just like the Case1.
My Fix in the PR is as following:
- We never free the BucketEntry directly, freeing the BucketEntry must through RefCnt.
- RefCnt#recycler just freeing the corresponding BucketEntry , not including removing the Block from BucketCache.backingMap or BucketCache.ramCache.
Removing the Block from BucketCache.backingMap or BucketCache.ramCache is immediately done in BucketCache.evictBlock, and in fact this behavior is the same as
BucketCache.evictBlock in branch-1, except that freeing BucketEntry is delayed until RefCnt becoming 0.
In current branch-2 implementation, removing the Block from BucketCache.backingMap or BucketCache.ramCache is done in RefCnt#recycler, which is something
unreasonable: we decreasing the RefCnt because we removing it from BucketCache.backingMap, not in reverse that because the RefCnt decreased to zero we
removing it from BucketCache.backingMap, which may causes the newly added Block with the same BlockCacheKey be incorrectly evicted from BucketCache.backingMap