[HBASE-11425] Cell/DBB end-to-end on the read-path - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: regionserver, Scanners
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
For E2E off heaped read path, first of all there should be an off heap backed BucketCache(BC). Configure 'hbase.bucketcache.ioengine' to offheap in hbase-site.xml. Also specify the total capacity of the BC using hbase.bucketcache.size config. Please remember to adjust value of 'HBASE_OFFHEAPSIZE' in hbase-env.sh as per this capacity. Here-by we specify the max possible off-heap memory allocation for the RS java process. So this should be bigger than the off-heap BC size. Please keep in mind that there is no default for hbase.bucketcache.ioengine which means the BC is turned OFF by default.

Next thing to tune is the ByteBuffer pool in the RPC server side. The buffers from this pool will be used to accumulate the cell bytes and create a result cell block to send back to the client side. 'hbase.ipc.server.reservoir.enabled' can be used to turn this pool ON or OFF. By default this pool is ON and available. HBase will create off heap ByteBuffers and pool them. Please make sure not to turn this OFF if you want E2E off heaping in read path. If this pool is turned off, the server will create temp buffers on heap to accumulate the cell bytes and make a result cell block. This can impact the GC on a highly read loaded server. The user can tune this pool with respect to how many buffers are in the pool and what should be the size of each ByteBuffer.
Use the config 'hbase.ipc.server.reservoir.initial.buffer.size' to tune each of the buffer sizes. Defaults is 64 KB.

When the read pattern is a random row read and each of the rows are smaller in size compared to this 64 KB, try reducing this. When the result size is larger than one ByteBuffer size, the server will try to grab more than one buffer and make a result cell block out of these. When the pool is running out of buffers, the server will end up creating temporary on-heap buffers.

The maximum number of ByteBuffers in the pool can be tuned using the config 'hbase.ipc.server.reservoir.initial.max'. Its value defaults to 64 * region server handlers configured (See the config 'hbase.regionserver.handler.count'). The math is such that by default we consider 2 MB as the result cell block size per read result and each handler will be handling a read. For 2 MB size, we need 32 buffers each of size 64 KB (See default buffer size in pool). So per handler 32 ByteBuffers(BB). We allocate twice this size as the max BBs count such that one handler can be creating the response and handing it to the RPC Responder thread and then handling a new request creating a new response cell block (using pooled buffers). Even if the responder could not send back the first TCP reply immediately, our count should allow that we should still have enough buffers in our pool without having to make temporary buffers on the heap. Again for smaller sized random row reads, tune this max count. There are lazily created buffers and the count is the max count to be pooled.

The setting for HBASE_OFFHEAPSIZE in hbase-env.sh should consider this off heap buffer pool at the RPC side also. We need to config this max off heap size for RS as a bit higher than the sum of this max pool size and the off heap cache size. The TCP layer will also need to create direct bytebuffers for TCP communication. Also the DFS client will need some off-heap to do its workings especially if short-circuit reads are configured. Allocating an extra of 1 - 2 GB for the max direct memory size has worked in tests.

If you still see GC issues even after making E2E read path off heap, look for issues in the appropriate buffer pool. Check the below RS log with INFO level:

"Pool already reached its max capacity : XXX and no free buffers now. Consider increasing the value for 'hbase.ipc.server.reservoir.initial.max' ?"

If you are using co processors and refer the Cells in the read results, DO NOT store reference to these Cells out of the scope of the CP hook methods. Some times the CPs need store info about the cell (Like its row key) for considering in the next CP hook call etc. For such cases, pls clone the required fields of the entire Cell as per the use cases. [ See CellUtil#cloneXXX(Cell) APIs ]

Show
For E2E off heaped read path, first of all there should be an off heap backed BucketCache(BC). Configure 'hbase.bucketcache.ioengine' to offheap in hbase-site.xml. Also specify the total capacity of the BC using hbase.bucketcache.size config. Please remember to adjust value of 'HBASE_OFFHEAPSIZE' in hbase-env.sh as per this capacity. Here-by we specify the max possible off-heap memory allocation for the RS java process. So this should be bigger than the off-heap BC size. Please keep in mind that there is no default for hbase.bucketcache.ioengine which means the BC is turned OFF by default. Next thing to tune is the ByteBuffer pool in the RPC server side. The buffers from this pool will be used to accumulate the cell bytes and create a result cell block to send back to the client side. 'hbase.ipc.server.reservoir.enabled' can be used to turn this pool ON or OFF. By default this pool is ON and available. HBase will create off heap ByteBuffers and pool them. Please make sure not to turn this OFF if you want E2E off heaping in read path. If this pool is turned off, the server will create temp buffers on heap to accumulate the cell bytes and make a result cell block. This can impact the GC on a highly read loaded server. The user can tune this pool with respect to how many buffers are in the pool and what should be the size of each ByteBuffer. Use the config 'hbase.ipc.server.reservoir.initial.buffer.size' to tune each of the buffer sizes. Defaults is 64 KB. When the read pattern is a random row read and each of the rows are smaller in size compared to this 64 KB, try reducing this. When the result size is larger than one ByteBuffer size, the server will try to grab more than one buffer and make a result cell block out of these. When the pool is running out of buffers, the server will end up creating temporary on-heap buffers. The maximum number of ByteBuffers in the pool can be tuned using the config 'hbase.ipc.server.reservoir.initial.max'. Its value defaults to 64 * region server handlers configured (See the config 'hbase.regionserver.handler.count'). The math is such that by default we consider 2 MB as the result cell block size per read result and each handler will be handling a read. For 2 MB size, we need 32 buffers each of size 64 KB (See default buffer size in pool). So per handler 32 ByteBuffers(BB). We allocate twice this size as the max BBs count such that one handler can be creating the response and handing it to the RPC Responder thread and then handling a new request creating a new response cell block (using pooled buffers). Even if the responder could not send back the first TCP reply immediately, our count should allow that we should still have enough buffers in our pool without having to make temporary buffers on the heap. Again for smaller sized random row reads, tune this max count. There are lazily created buffers and the count is the max count to be pooled. The setting for HBASE_OFFHEAPSIZE in hbase-env.sh should consider this off heap buffer pool at the RPC side also. We need to config this max off heap size for RS as a bit higher than the sum of this max pool size and the off heap cache size. The TCP layer will also need to create direct bytebuffers for TCP communication. Also the DFS client will need some off-heap to do its workings especially if short-circuit reads are configured. Allocating an extra of 1 - 2 GB for the max direct memory size has worked in tests. If you still see GC issues even after making E2E read path off heap, look for issues in the appropriate buffer pool. Check the below RS log with INFO level: "Pool already reached its max capacity : XXX and no free buffers now. Consider increasing the value for 'hbase.ipc.server.reservoir.initial.max' ?" If you are using co processors and refer the Cells in the read results, DO NOT store reference to these Cells out of the scope of the CP hook methods. Some times the CPs need store info about the cell (Like its row key) for considering in the next CP hook call etc. For such cases, pls clone the required fields of the entire Cell as per the use cases. [ See CellUtil#cloneXXX(Cell) APIs ]

Description

Umbrella jira to make sure we can have blocks cached in offheap backed cache. In the entire read path, we can refer to this offheap buffer and avoid onheap copying.
The high level items I can identify as of now are
1. Avoid the array() call on BB in read path.. (This is there in many classes. We can handle class by class)
2. Support Buffer based getter APIs in cell. In read path we will create a new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), CPs etc.
3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy.
4. Remove all CP hooks (which are already deprecated) which deal with KVs. (In read path)

Will add subtasks under this.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screen Shot 2015-10-16 at 5.13.22 PM.png
17/Oct/15 00:16
442 kB
Michael Stack
ram.log
16/Oct/15 20:58
8.46 MB
Michael Stack
GC pics with evictions_4G heap.png
15/Oct/15 06:28
55 kB
ramkrishna.s.vasudevan
heap.png
15/Oct/15 00:00
25 kB
Michael Stack
gets.png
14/Oct/15 17:10
20 kB
Michael Stack
load.png
14/Oct/15 17:10
31 kB
Michael Stack
gc.png
14/Oct/15 17:10
31 kB
Michael Stack
median.png
14/Oct/15 17:10
41 kB
Michael Stack
Benchmarks_Tests.docx
05/May/15 10:08
46 kB
Anoop Sam John
BenchmarkTestCode.zip
02/May/15 05:59
10 kB
Anoop Sam John
HBASE-11425.patch
31/Mar/15 10:41
976 kB
ramkrishna.s.vasudevan
HBASE-11425-E2E-NotComplete.patch
18/Mar/15 10:08
837 kB
Anoop Sam John
Offheap reads in HBase using BBs_V2.pdf
18/Mar/15 10:06
115 kB
Anoop Sam John
Offheap reads in HBase using BBs_final.pdf
09/Mar/15 12:34
94 kB
ramkrishna.s.vasudevan

Issue Links

depends upon

HBASE-10800 Use CellComparator instead of KVComparator

Closed

is related to

HBASE-10772 Use ByteRanges instead of ByteBuffers in BlockCache

Closed

HBASE-10773 Make use of ByteRanges in HFileBlock instead of ByteBuffers

Closed

HBASE-10801 Ensure DBE interfaces can work with Cell

Closed

HBASE-10974 Improve DBEs read performance by avoiding byte array deep copies for key[] and value[]

Closed

HBASE-7320 Remove KeyValue.getBuffer()

Closed

relates to

HBASE-15525 OutOfMemory could occur when using BoundedByteBufferPool during RPC bursts

Closed

HBASE-16626 User customized RegionScanner from 1.X is incompatible with 2.0.0's off-heap part

Closed

HBASE-20281 [DOC] upgrade section needs an explanation of changes to Bucket Cache

Open

HBASE-15063 Bug in MultiByteBuf#toBytes

Closed

HBASE-15064 BufferUnderflowException after last Cell fetched from an HFile Block served from L2 offheap cache

Closed

HBASE-14940 Make our unsafe based ops more safe

Closed

HBASE-15379 Fake cells created in read path not implementing SettableSequenceId

Closed

HBASE-16609 Fake cells EmptyByteBufferedCell created in read path not implementing SettableSequenceId

Closed

HBASE-16783 Use ByteBufferPool for the header and message during Rpc response

Closed

HBASE-11871 Avoid usage of KeyValueUtil#ensureKeyValue

Closed

links to

(1 is related to, 10 relates to, 1 links to)

Sub-Tasks

1.	Support KeyValueCodec to encode non KeyValue cells.	Closed	Anoop Sam John
2.	Support DirectByteBuffer usage in HFileBlock	Closed	Anoop Sam John
3.	HFileBlock backed by Array of ByteBuffers	Closed	ramkrishna.s.vasudevan
4.	Facilitate using ByteBuffer backed Cells in the HFileReader	Closed	Unassigned
5.	Ensure Cells and its implementations work with Buffers also	Closed	Unassigned
6.	Support DirectByteBuffer usage in DataBlock Encoding area	Closed	Anoop Sam John
7.	Avoid onheap buffer copying at RPCServer#serResponse	Closed	Unassigned
8.	Column trackers and delete trackers should deal with BBs	Closed	Unassigned
9.	Prevent block eviction under us if reads are in progress from the BBs	Closed	ramkrishna.s.vasudevan
10.	Filters should work with ByteBufferedCell	Closed	Anoop Sam John
11.	Support DBB usage in Bloom and HFileIndex area	Closed	Anoop Sam John
12.	Support BB usage in PrefixTree	Closed	ramkrishna.s.vasudevan
13.	Memstore and MemstoreScanner should work with BBs.	Closed	Unassigned
14.	Redo the hfile index length optimization so cell-based rather than serialized KV key	Closed	Michael Stack
15.	Unsafe based ByteBuffer Comparator	Closed	Anoop Sam John
16.	Create ByteBuffer backed Cell	Closed	Unassigned
17.	Add unsafe putXXXUnsafe() BB methods to ByteBufferUtils	Closed	Unassigned
18.	Change DBEs to work with new BB based cell	Closed	Anoop Sam John
19.	Tags to work with ByteBuffer	Closed	Anoop Sam John
20.	Add ByteBufferedCell an extension to Cell	Closed	Anoop Sam John
21.	Remove deprecated seek/reseek methods from HFileScanner	Closed	Anoop Sam John
22.	Deperecate Filter#filterRowKey(byte[] buffer, int offset, int length) in favor of filterRowKey(Cell firstRowCell)	Closed	Anoop Sam John
23.	Deprecate RegionObserver#postScannerFilterRow CP hook with byte[],int,int args in favor of taking Cell arg	Closed	Anoop Sam John
24.	Change ColumnTracker and SQM to deal with Cell instead of byte[], int, int	Closed	Anoop Sam John
25.	Delayed scanner close in KeyValueHeap and StoreScanner	Closed	Anoop Sam John
26.	Change RegionScannerImpl to deal with Cell instead of byte[], int, int	Closed	Anoop Sam John
27.	Create MultiByteBuffer an aggregation of ByteBuffers	Closed	Anoop Sam John
28.	Close the scanner only after Call#setResponse	Closed	Anoop Sam John
29.	Use BufferBackedCell in read path after HBASE-12213 and HBASE-12295	Closed	ramkrishna.s.vasudevan
30.	Change ByteBuff.getXXXStrictlyForward to relative position based reads	Closed	Anoop Sam John
31.	ByteBufferUtils#compareTo small optimization	Closed	Anoop Sam John
32.	Bloomfilter path to work with Byte buffered cells	Closed	ramkrishna.s.vasudevan
33.	Reduce garbage we create	Closed	Anoop Sam John
34.	Short circuit last byte check in CellUtil#matchingXXX methods for ByteBufferedCells	Closed	Anoop Sam John
35.	Create the fake keys required in the scan path to avoid copy to byte[]	Closed	ramkrishna.s.vasudevan
36.	Small optimization in SingleByteBuff	Closed	Anoop Sam John
37.	Short circuit checks in ByteBufferUtils compare/equals	Closed	Unassigned
38.	Shorten ByteBufferedCell#getXXXPositionInByteBuffer method name	Closed	Anoop Sam John
39.	Clear HFileScannerImpl#prevBlocks in between Compaction flow	Closed	Anoop Sam John
40.	Ensure write paths work with ByteBufferedCells in case of compaction	Closed	ramkrishna.s.vasudevan
41.	Support OffheapKV write in compaction with out copying data on heap	Closed	Anoop Sam John
42.	Tightening of the CP contract	Closed	Anoop Sam John
43.	Unnecessary lock in ByteBufferArray	Closed	Anoop Sam John
44.	References to previous cell in read path should be avoided	Closed	ramkrishna.s.vasudevan
45.	LRUBlockCache#returnBlock should try return block to Victim Handler L2 cache.	Closed	Anoop Sam John
46.	L1 cache caching shared memory HFile block when blocks promoted from L2 to L1	Closed	Anoop Sam John
47.	Make sure documentation is updated for the offheap Bucket cache usage	Closed	Anoop Sam John

Activity

People

Assignee:: Anoop Sam John

Reporter:: Anoop Sam John

Votes:: 0 Vote for this issue

Watchers:: 36 Start watching this issue

Dates

Created:: 27/Jun/14 17:19

Updated:: 17/Jun/22 05:12

Resolved:: 28/Jan/16 05:21