Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5032

Add other DELETE type information into the delete bloom filter to optimize the time range query

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.89-fb
    • Component/s: None
    • Labels:
      None

      Description

      To speed up time range scans we need to seek to the maximum timestamp of the requested range,instead of going to the first KV of the (row, column) pair and iterating from there. If we don't know the (row, column), e.g. if it is not specified in the query, we need to go to end of the current row/column pair first, get a KV from there, and do another seek to (row', column', timerange_max) from there. We can only skip over to the timerange_max timestamp when we know that there are no DeleteColumn records at the top of that row/column with a higher timestamp. We can utilize another Bloom filter keyed on (row, column) to quickly find that out. (From HBASE-4962)

      So the motivation is to save seek ops for scanning time-range queries if we know there is no delete for this row/column.

      From the implementation perspective, we have already had a delete family bloom filter which contains all the delete family key values. So we can reuse the same bloom filter for all other kinds of delete information such as delete columns or delete.

        Issue Links

          Activity

          There are no comments yet on this issue.

            People

            • Assignee:
              adela Adela Maznikar
              Reporter:
              liyin Liyin Tang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development