Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-4532

Avoid top row seek by dedicated bloom filter for delete family bloom filter

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.94.0
    • None
    • None
    • Reviewed

    Description

      The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled.
      This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

      The only subtle use case is when we are interested in the top row with empty column.

      For example,
      we are interested in row1/cf1:/1/put.
      So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
      Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
      In this way, we have already missed the real kv we are interested in.

      The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

      Evaluation from TestSeekOptimization:
      Previously:
      For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
      For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
      For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

      For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
      For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
      For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

      So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.HBASE-4469

      ================================================

      After this change:
      For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
      For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
      For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

      For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
      For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
      For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

      So we can get about 10% more seek savings for ALL kinds of bloom filter.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            liyin Liyin Tang Assign to me
            liyin Liyin Tang
            Votes:
            2 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment