Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6040

Use block encoding and HBase handled checksum verification in bulk loading using HFileOutputFormat

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.94.0, 0.95.2
    • 0.94.1
    • mapreduce
    • None
    • Reviewed
    • Hide
      Added a new config param "hbase.mapreduce.hfileoutputformat.datablock.encoding" using which we can specify which encoding scheme to be used on disk. Data will get written in to HFiles using this encoding scheme while bulk load. The value of this can be NONE, PREFIX, DIFF, FAST_DIFF as these are the DataBlockEncoding types supported now. [When any new types are added later, corresponding names also will become valid]
      The checksum type and number of bytes per checksum can be configured using the config params hbase.hstore.checksum.algorithm, hbase.hstore.bytes.per.checksum respectively
      Show
      Added a new config param "hbase.mapreduce.hfileoutputformat.datablock.encoding" using which we can specify which encoding scheme to be used on disk. Data will get written in to HFiles using this encoding scheme while bulk load. The value of this can be NONE, PREFIX, DIFF, FAST_DIFF as these are the DataBlockEncoding types supported now. [When any new types are added later, corresponding names also will become valid] The checksum type and number of bytes per checksum can be configured using the config params hbase.hstore.checksum.algorithm, hbase.hstore.bytes.per.checksum respectively
    • bulkload

    Description

      When the data is bulk loaded using HFileOutputFormat, we are not using the block encoding and the HBase handled checksum features.. When the writer is created for making the HFile, I am not seeing any such info passing to the WriterBuilder.
      In HFileOutputFormat.getNewWriter(byte[] family, Configuration conf), we dont have these info and do not pass also to the writer... So those HFiles will not have these optimizations..

      Later in LoadIncrementalHFiles.copyHFileHalf(), where we physically divide one HFile(created by the MR) iff it can not belong to just one region, I can see we pass the datablock encoding details and checksum details to the new HFile writer. But this step wont happen normally I think..

      Attachments

        1. HBASE-6040_Trunk.patch
          3 kB
          Anoop Sam John
        2. HBASE-6040_94.patch
          3 kB
          Anoop Sam John

        Issue Links

          Activity

            People

              anoopsamjohn Anoop Sam John
              anoopsamjohn Anoop Sam John
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: