Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27232

Fix checking for encoded block size when deciding if block should be closed

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-3, 2.4.13
    • 2.6.0, 3.0.0-alpha-4
    • None
    • None
    • Hide
      This changed behaviour of "hbase.writer.unified.encoded.blocksize.ratio" property:

      Previous behaviour: Checks if the encoded block size >= ("hbase.writer.unified.encoded.blocksize.ratio" * BLOCK_SIZE) || (non-encoded block size >= BLOCK_SIZE) when delimiting hfile blocks. As most often (non-encoded block size >= BLOCK_SIZE) will be reached, setting "hbase.writer.unified.encoded.blocksize.ratio" usually had no effect.
      The default value for "hbase.writer.unified.encoded.blocksize.ratio" was "1".

      New behaviour: If "hbase.writer.unified.encoded.blocksize.ratio" is set to anything different from "0", it will check if encoded block size >= ("hbase.writer.unified.encoded.blocksize.ratio" * BLOCK_SIZE) when delimiting an hfile block. If "hbase.writer.unified.encoded.blocksize.ratio" is not set, it will check if encoded block size >= BLOCK_SIZE || non-encoded block size >= BLOCK_SIZE when delimiting an hfile block.
      Show
      This changed behaviour of "hbase.writer.unified.encoded.blocksize.ratio" property: Previous behaviour: Checks if the encoded block size >= ("hbase.writer.unified.encoded.blocksize.ratio" * BLOCK_SIZE) || (non-encoded block size >= BLOCK_SIZE) when delimiting hfile blocks. As most often (non-encoded block size >= BLOCK_SIZE) will be reached, setting "hbase.writer.unified.encoded.blocksize.ratio" usually had no effect. The default value for "hbase.writer.unified.encoded.blocksize.ratio" was "1". New behaviour: If "hbase.writer.unified.encoded.blocksize.ratio" is set to anything different from "0", it will check if encoded block size >= ("hbase.writer.unified.encoded.blocksize.ratio" * BLOCK_SIZE) when delimiting an hfile block. If "hbase.writer.unified.encoded.blocksize.ratio" is not set, it will check if encoded block size >= BLOCK_SIZE || non-encoded block size >= BLOCK_SIZE when delimiting an hfile block.

    Description

      On HFileWriterImpl.checkBlockBoundary, we useed to consider the unencoded and uncompressed data size when deciding to close a block and start a new one. That could lead to varying "on-disk" block sizes, depending on the encoding efficiency for the cells in each block.

      HBASE-17757 introduced the hbase.writer.unified.encoded.blocksize.ratio property, as ration of the original configured block size, to be compared against the encoded size. This was an attempt to ensure homogeneous block sizes. However, the check introduced by HBASE-17757 also considers the unencoded size, which in the cases where encoding efficiency is higher than what's configured in hbase.writer.unified.encoded.blocksize.ratio, it would still lead to varying block sizes.

      This patch changes that logic, to only consider encoded size if hbase.writer.unified.encoded.blocksize.ratio property is set, otherwise, it will consider the unencoded size. This gives a finer control over the on-disk block sizes and the overall number of blocks when encoding is in use.

      Attachments

        Issue Links

          Activity

            People

              wchevreuil Wellington Chevreuil
              wchevreuil Wellington Chevreuil
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: