khemani updated the revision "
HBASE-5347 [jira] GC free memory management in Level-1 Block Cache".
Reviewers: Kannan, mbautin, dhruba, nspiegelberg, stack, JIRA
In this iteration I have been testing these changes by starting in the region server a 1000 threads that continuously call incrementColumnValue() on large random keys. In this patch there are some temporary changes to HRegionInterface to do this testing ...
In the test that I run on my dev cluster, there is enough data in the test table, the java heap is set to 24GB, block-cache is set to 60% of the heap. Without this patch I pretty soon run into GC pauses. With this patch, and even without any blocks-out-of-large-slab allocation and without changing to synchronous eviction in lru block cache, there are no GC pauses.
So, there are two kinds of objects being reference counted in this patch - HFileBlocks and KeyValues. HFileBlocks are easy to reference counts because they only exist in HFileScanners and below. KeyValues (those ones who directly refer to data in HFileBlock) are more difficult to reference count.
My earlier approach was that the StoreFileScanner will hand out pinned key-values and the higher layers will not have to bother about deref()ing the key-values. The key-values will be deref()'d only when they are written out. But this approach has many issues and quickly becomes non-intuitive. For example, with that older approach, StoreFileScanner will not be able to deref() its cur kv when it is closed and it will have to rely on the GC for cleanup.
In the current approach you don't have to ref() the kv that you get from a scanner if you are going to keep it beyond the next scanner's next(), seek(), reseek() or close() call. (If you don't ref it then you will soon get a nullptrexception). You should also deref it when you are done, but it is not absolutely required.
I have lot more cleanup to do. I will revert some of the method name changes. I will refactor the code in new classes. There is lot many tests to be written. I will re-write this in a way such that it can be run with or without the hfile-block-pool. The block pool enhancements and lru-block cache enhancements in separate diffs.
This changes the programming model quite a bit ...
Some number from the test that I am running
49 million KVs have been reference counted. Out of that somehow the code forgets to deref 17 of them. About 2 million of them end up on the deadKVs ReferenceQueue even though they have been properly deref'd ... don't know why this happens.
The block cache has 10 GB of active blocks. 1.5GB is maintained in the free pool by the eviction process. We will be able to use all the block cache memory (that is around 14.5GB) once LRUBlockCache eviction is made synchronous.
235K allocations were done from the system. Some of these were blocks larger than 69K so they were not managed by the pool. The blocks got resused 5.5m times.