Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-16213

A new HFileBlock structure for fast random get

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0, 2.0.0
    • Component/s: Performance
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      HBASE-16213 introduced a new DataBlockEncoding in name of ROW_INDEX_V1, which could improve random read (get) performance especially when the average record size (key-value size per row) is small. To use this feature, please set DATA_BLOCK_ENCODING to ROW_INDEX_V1 for CF of newly created table, or change existing CF with below command:
      alter 'table_name',{NAME => 'cf', DATA_BLOCK_ENCODING => 'ROW_INDEX_V1'}.

      Please note that if we turn this DBE on, HFile block will be bigger than NONE encoding because it adds some meta infos for binary search:
      /**
       * Store cells following every row's start offset, so we can binary search to a row's cells.
       *
       * Format:
       * flat cells
       * integer: number of rows
       * integer: row0's offset
       * integer: row1's offset
       * ....
       * integer: dataSize
       *
      */

      Seek in row when random reading is one of the main consumers of CPU. This helps. See slide #7 here https://www.slideshare.net/HBaseCon/lift-the-ceiling-of-hbase-throughputs?qid=597ee2fa-8125-4faa-bb3b-2bf1ba9ccafb&v=&b=&from_search=6
      Show
      HBASE-16213 introduced a new DataBlockEncoding in name of ROW_INDEX_V1, which could improve random read (get) performance especially when the average record size (key-value size per row) is small. To use this feature, please set DATA_BLOCK_ENCODING to ROW_INDEX_V1 for CF of newly created table, or change existing CF with below command: alter 'table_name',{NAME => 'cf', DATA_BLOCK_ENCODING => 'ROW_INDEX_V1'}. Please note that if we turn this DBE on, HFile block will be bigger than NONE encoding because it adds some meta infos for binary search: /**  * Store cells following every row's start offset, so we can binary search to a row's cells.  *  * Format:  * flat cells  * integer: number of rows  * integer: row0's offset  * integer: row1's offset  * ....  * integer: dataSize  * */ Seek in row when random reading is one of the main consumers of CPU. This helps. See slide #7 here https://www.slideshare.net/HBaseCon/lift-the-ceiling-of-hbase-throughputs?qid=597ee2fa-8125-4faa-bb3b-2bf1ba9ccafb&v=&b=&from_search=6

      Description

      HFileBlock store cells sequential, current when to get a row from the block, it scan from the first cell until the row's cell.
      The new structure store every row's start offset with data, so it can find the exact row with binarySearch.

      I use EncodedSeekPerformanceTest test the performance.
      First use ycsb write 100w data, every row have only one qualifier, and valueLength=16B/64/256B/1k.
      Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and also record HFileBlock's dataSize/dataWithMetaSize in the encoding.

        Attachments

        1. HBASE-16213_branch1_v3.patch
          91 kB
          Lijin Bin
        2. HBASE-16213_v2.patch
          89 kB
          Lijin Bin
        3. HBASE-16213.branch-1.v1.patch
          32 kB
          Lijin Bin
        4. HBASE-16213.branch-1.v4.patch
          32 kB
          Lijin Bin
        5. HBASE-16213.branch-1.v4.patch
          32 kB
          Lijin Bin
        6. HBASE-16213.patch
          89 kB
          Lijin Bin
        7. HBASE-16213-master_v1.patch
          41 kB
          Lijin Bin
        8. HBASE-16213-master_v3.patch
          39 kB
          Lijin Bin
        9. HBASE-16213-master_v4.patch
          39 kB
          Lijin Bin
        10. HBASE-16213-master_v5.patch
          38 kB
          Lijin Bin
        11. HBASE-16213-master_v6.patch
          38 kB
          Lijin Bin
        12. hfile_block_performance_E2E.pptx
          61 kB
          Lijin Bin
        13. hfile_block_performance.pptx
          68 kB
          Lijin Bin
        14. hfile_block_performance2.pptx
          84 kB
          Lijin Bin
        15. hfile-cpu.png
          27 kB
          Lijin Bin

          Issue Links

            Activity

              People

              • Assignee:
                binlijin Lijin Bin
                Reporter:
                binlijin Lijin Bin
              • Votes:
                0 Vote for this issue
                Watchers:
                23 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: