Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-1978

Change the range/block index scheme from [start,end) to (start, end], and index range/block by endKey, specially in HFile

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Later
    • None
    • None
    • io, master, regionserver
    • None
    • HFile, METADATA, INDEX

    Description

      From the code review of HFile (HBASE-1818), we found the HFile allows duplicated key. But the old implementation would lead to missing of duplicated key when seek and scan, when the duplicated key span multiple blocks.

      We provide a patch (HBASE-1841 is't step1) to resolve above issue. This patch modified HFile.Writer to avoid generating a problem hfile with above cross-block duplicated key. It only start a new block when current appending key is different from the last appended key. But it still has a rish when the user of HFile.Writer append many same duplicated key which lead to a very large block and need much memory or Out-of-memory.

      The current HFile's block-index use startKey to index a block, i.e. the range/block index scheme is [startKey,endKey).

      As refering to the section 5.1 of the Google Bigtable paper.

      "The METADATA table stores the location of a tablet under a row key that is an encoding of the tablet's table identifer and its end row."

      The theory of Bigtable's METADATA is same as the BlockIndex in a SSTable or HFile, so we should use EndKey in HFile's BlockIndex. In my experiences of Hypertable, the METADATA is also "tableID:endRow".

      We would change the index scheme in HFile, from [startKey,endKey) to (startKey,endKey]. And change the binary search method to meet this index scheme.

      This change can resolve above duplicated-key issue.

      Note:
      The totally fix need to modify many modules in HBase, seems include HFile, META schema, some internal code, etc.

      Attachments

        1. HBASE-1978-HFile-v1.patch
          14 kB
          Schubert Zhang

        Issue Links

          Activity

            People

              Unassigned Unassigned
              schubertzhang Schubert Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: