Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-11425

Cell/DBB end-to-end on the read-path

    Details

    • Type: Umbrella
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0
    • Component/s: regionserver, Scanners
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      For E2E off heaped read path, first of all there should be an off heap backed BucketCache(BC). Configure 'hbase.bucketcache.ioengine' to offheap in hbase-site.xml. Also specify the total capacity of the BC using hbase.bucketcache.size config. Please remember to adjust value of 'HBASE_OFFHEAPSIZE' in hbase-env.sh as per this capacity. Here-by we specify the max possible off-heap memory allocation for the RS java process. So this should be bigger than the off-heap BC size. Please keep in mind that there is no default for hbase.bucketcache.ioengine which means the BC is turned OFF by default.

      Next thing to tune is the ByteBuffer pool in the RPC server side. The buffers from this pool will be used to accumulate the cell bytes and create a result cell block to send back to the client side. 'hbase.ipc.server.reservoir.enabled' can be used to turn this pool ON or OFF. By default this pool is ON and available. HBase will create off heap ByteBuffers and pool them. Please make sure not to turn this OFF if you want E2E off heaping in read path. If this pool is turned off, the server will create temp buffers on heap to accumulate the cell bytes and make a result cell block. This can impact the GC on a highly read loaded server. The user can tune this pool with respect to how many buffers are in the pool and what should be the size of each ByteBuffer.
      Use the config 'hbase.ipc.server.reservoir.initial.buffer.size' to tune each of the buffer sizes. Defaults is 64 KB.

      When the read pattern is a random row read and each of the rows are smaller in size compared to this 64 KB, try reducing this. When the result size is larger than one ByteBuffer size, the server will try to grab more than one buffer and make a result cell block out of these. When the pool is running out of buffers, the server will end up creating temporary on-heap buffers.

      The maximum number of ByteBuffers in the pool can be tuned using the config 'hbase.ipc.server.reservoir.initial.max'. Its value defaults to 64 * region server handlers configured (See the config 'hbase.regionserver.handler.count'). The math is such that by default we consider 2 MB as the result cell block size per read result and each handler will be handling a read. For 2 MB size, we need 32 buffers each of size 64 KB (See default buffer size in pool). So per handler 32 ByteBuffers(BB). We allocate twice this size as the max BBs count such that one handler can be creating the response and handing it to the RPC Responder thread and then handling a new request creating a new response cell block (using pooled buffers). Even if the responder could not send back the first TCP reply immediately, our count should allow that we should still have enough buffers in our pool without having to make temporary buffers on the heap. Again for smaller sized random row reads, tune this max count. There are lazily created buffers and the count is the max count to be pooled.

      The setting for HBASE_OFFHEAPSIZE in hbase-env.sh should consider this off heap buffer pool at the RPC side also. We need to config this max off heap size for RS as a bit higher than the sum of this max pool size and the off heap cache size. The TCP layer will also need to create direct bytebuffers for TCP communication. Also the DFS client will need some off-heap to do its workings especially if short-circuit reads are configured. Allocating an extra of 1 - 2 GB for the max direct memory size has worked in tests.

      If you still see GC issues even after making E2E read path off heap, look for issues in the appropriate buffer pool. Check the below RS log with INFO level:

        "Pool already reached its max capacity : XXX and no free buffers now. Consider increasing the value for 'hbase.ipc.server.reservoir.initial.max' ?"

      If you are using co processors and refer the Cells in the read results, DO NOT store reference to these Cells out of the scope of the CP hook methods. Some times the CPs need store info about the cell (Like its row key) for considering in the next CP hook call etc. For such cases, pls clone the required fields of the entire Cell as per the use cases. [ See CellUtil#cloneXXX(Cell) APIs ]
      Show
      For E2E off heaped read path, first of all there should be an off heap backed BucketCache(BC). Configure 'hbase.bucketcache.ioengine' to offheap in hbase-site.xml. Also specify the total capacity of the BC using hbase.bucketcache.size config. Please remember to adjust value of 'HBASE_OFFHEAPSIZE' in hbase-env.sh as per this capacity. Here-by we specify the max possible off-heap memory allocation for the RS java process. So this should be bigger than the off-heap BC size. Please keep in mind that there is no default for hbase.bucketcache.ioengine which means the BC is turned OFF by default. Next thing to tune is the ByteBuffer pool in the RPC server side. The buffers from this pool will be used to accumulate the cell bytes and create a result cell block to send back to the client side. 'hbase.ipc.server.reservoir.enabled' can be used to turn this pool ON or OFF. By default this pool is ON and available. HBase will create off heap ByteBuffers and pool them. Please make sure not to turn this OFF if you want E2E off heaping in read path. If this pool is turned off, the server will create temp buffers on heap to accumulate the cell bytes and make a result cell block. This can impact the GC on a highly read loaded server. The user can tune this pool with respect to how many buffers are in the pool and what should be the size of each ByteBuffer. Use the config 'hbase.ipc.server.reservoir.initial.buffer.size' to tune each of the buffer sizes. Defaults is 64 KB. When the read pattern is a random row read and each of the rows are smaller in size compared to this 64 KB, try reducing this. When the result size is larger than one ByteBuffer size, the server will try to grab more than one buffer and make a result cell block out of these. When the pool is running out of buffers, the server will end up creating temporary on-heap buffers. The maximum number of ByteBuffers in the pool can be tuned using the config 'hbase.ipc.server.reservoir.initial.max'. Its value defaults to 64 * region server handlers configured (See the config 'hbase.regionserver.handler.count'). The math is such that by default we consider 2 MB as the result cell block size per read result and each handler will be handling a read. For 2 MB size, we need 32 buffers each of size 64 KB (See default buffer size in pool). So per handler 32 ByteBuffers(BB). We allocate twice this size as the max BBs count such that one handler can be creating the response and handing it to the RPC Responder thread and then handling a new request creating a new response cell block (using pooled buffers). Even if the responder could not send back the first TCP reply immediately, our count should allow that we should still have enough buffers in our pool without having to make temporary buffers on the heap. Again for smaller sized random row reads, tune this max count. There are lazily created buffers and the count is the max count to be pooled. The setting for HBASE_OFFHEAPSIZE in hbase-env.sh should consider this off heap buffer pool at the RPC side also. We need to config this max off heap size for RS as a bit higher than the sum of this max pool size and the off heap cache size. The TCP layer will also need to create direct bytebuffers for TCP communication. Also the DFS client will need some off-heap to do its workings especially if short-circuit reads are configured. Allocating an extra of 1 - 2 GB for the max direct memory size has worked in tests. If you still see GC issues even after making E2E read path off heap, look for issues in the appropriate buffer pool. Check the below RS log with INFO level:   "Pool already reached its max capacity : XXX and no free buffers now. Consider increasing the value for 'hbase.ipc.server.reservoir.initial.max' ?" If you are using co processors and refer the Cells in the read results, DO NOT store reference to these Cells out of the scope of the CP hook methods. Some times the CPs need store info about the cell (Like its row key) for considering in the next CP hook call etc. For such cases, pls clone the required fields of the entire Cell as per the use cases. [ See CellUtil#cloneXXX(Cell) APIs ]

      Description

      Umbrella jira to make sure we can have blocks cached in offheap backed cache. In the entire read path, we can refer to this offheap buffer and avoid onheap copying.
      The high level items I can identify as of now are
      1. Avoid the array() call on BB in read path.. (This is there in many classes. We can handle class by class)
      2. Support Buffer based getter APIs in cell. In read path we will create a new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), CPs etc.
      3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy.
      4. Remove all CP hooks (which are already deprecated) which deal with KVs. (In read path)

      Will add subtasks under this.

        Attachments

        1. Offheap reads in HBase using BBs_final.pdf
          94 kB
          ramkrishna.s.vasudevan
        2. Offheap reads in HBase using BBs_V2.pdf
          115 kB
          Anoop Sam John
        3. HBASE-11425-E2E-NotComplete.patch
          837 kB
          Anoop Sam John
        4. HBASE-11425.patch
          976 kB
          ramkrishna.s.vasudevan
        5. BenchmarkTestCode.zip
          10 kB
          Anoop Sam John
        6. Benchmarks_Tests.docx
          46 kB
          Anoop Sam John
        7. median.png
          41 kB
          stack
        8. gc.png
          31 kB
          stack
        9. load.png
          31 kB
          stack
        10. gets.png
          20 kB
          stack
        11. heap.png
          25 kB
          stack
        12. GC pics with evictions_4G heap.png
          55 kB
          ramkrishna.s.vasudevan
        13. ram.log
          8.46 MB
          stack
        14. Screen Shot 2015-10-16 at 5.13.22 PM.png
          442 kB
          stack

          Issue Links

          There are no Sub-Tasks for this issue.

            Activity

              People

              • Assignee:
                anoopsamjohn Anoop Sam John
                Reporter:
                anoop.hbase Anoop Sam John
              • Votes:
                0 Vote for this issue
                Watchers:
                33 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: