Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6014

Support for block-granularity bitmap indexes

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: regionserver
    • Labels:
      None

      Description

      This came up in a discussion with Kannan today, so I promised to write something brief on JIRA – this was suggested as a potential summer intern project. The idea is as follows:

      We have several customers who periodically run full table scan MR jobs against large HBase tables while applying fairly restrictive predicates. The predicates are often reasonably simple boolean expressions across known columns, and those columns often are enum-typed or otherwise have a fairly restricted range of values. For example, a real time process may mark rows as dirty, and a background MR job may scan for dirty rows in order to perform further processing like rebuilding inverted indexes.

      One way to speed up this type of query is to add bitmap indexes. In the context of HBase, I would envision this as a new type of metadata block included in the HFile which has a series of tuples: (qualifier, value range, compressed bitmap). A 1 bit in the bitmap indicates that the corresponding HFile block has at least one cell for which a column with the given qualifier falls within the given range. Queries which have an equality or comparison predicate against an indexed qualifier can then use the bitmap index to seek directly to those blocks which may contain relevant data.

      1. bitmap-hacking.txt
        24 kB
        Todd Lipcon
      2. 6014-bitmap-hacking.txt
        24 kB
        Ted Yu

        Issue Links

          Activity

          Hide
          lemire Daniel Lemire added a comment -

          A good choice for this might be Roaring bitmaps (http://roaringbitmap.org/). They are used by Apache Spark, Druid, Apache Kylin, Apache Lucene and so forth. No patent, Apache license.

          Show
          lemire Daniel Lemire added a comment - A good choice for this might be Roaring bitmaps ( http://roaringbitmap.org/ ). They are used by Apache Spark, Druid, Apache Kylin, Apache Lucene and so forth. No patent, Apache license.
          Hide
          kannanm Kannan Muthukkaruppan added a comment -

          Can't think of one that'll immediately benefit from this. So this will be low-pri for us too right now.

          Show
          kannanm Kannan Muthukkaruppan added a comment - Can't think of one that'll immediately benefit from this. So this will be low-pri for us too right now.
          Hide
          tlipcon Todd Lipcon added a comment -

          Hey Kannan. You're exactly right on both cases. I'm not certain if this is a huge win yet - it's probably only helpful on some very specific workloads. So, we're not prioritizing it here. But really curious to hear if you think there are some applications at FB that would benefit.

          Show
          tlipcon Todd Lipcon added a comment - Hey Kannan. You're exactly right on both cases. I'm not certain if this is a huge win yet - it's probably only helpful on some very specific workloads. So, we're not prioritizing it here. But really curious to hear if you think there are some applications at FB that would benefit.
          Hide
          kannanm Kannan Muthukkaruppan added a comment -

          @Todd:

          Some early questions I have:

          1) A bit per block may not be very effective in many cases.. e.g., in the "mark rows as dirty" example in your description, suppose each HFileBlock has at least one dirty KV, then no blocks will get pruned. Similarly, many classic cases, like say state names, it is quite possible that every block contains almost every state. So the use of the feature will be limited for really narrow selectivity-- where we expect only a small % of the blocks in the file to contain the data of interest. Is this is the model/use case you are targeting? [Just want to make sure.]

          2) Also, regarding <<< metadata block included in the HFile which has a series of tuples: (qualifier, value range, compressed bitmap).>>>. Could you clarify what the "value range" is about? For the "enum" type use, the tuples will be "qualifier, enum for value, compressed bitmap", isn't it? and one such tuple per block for each enum, correct? Is the "value range" for cases where say you want to query the column value by range (e.g., say temperature). And is the idea to slice the range of values for the column (say temperatures) into sub-ranges and have a bitmap per range, thus allowing users to do range queries by consulting the appropriate bit maps.

          Show
          kannanm Kannan Muthukkaruppan added a comment - @Todd: Some early questions I have: 1) A bit per block may not be very effective in many cases.. e.g., in the "mark rows as dirty" example in your description, suppose each HFileBlock has at least one dirty KV, then no blocks will get pruned. Similarly, many classic cases, like say state names, it is quite possible that every block contains almost every state. So the use of the feature will be limited for really narrow selectivity-- where we expect only a small % of the blocks in the file to contain the data of interest. Is this is the model/use case you are targeting? [Just want to make sure.] 2) Also, regarding <<< metadata block included in the HFile which has a series of tuples: (qualifier, value range, compressed bitmap).>>>. Could you clarify what the "value range" is about? For the "enum" type use, the tuples will be "qualifier, enum for value, compressed bitmap", isn't it? and one such tuple per block for each enum, correct? Is the "value range" for cases where say you want to query the column value by range (e.g., say temperature). And is the idea to slice the range of values for the column (say temperatures) into sub-ranges and have a bitmap per range, thus allowing users to do range queries by consulting the appropriate bit maps.
          Hide
          zhihyu@ebaysf.com Ted Yu added a comment -

          I didn't rebase Todd's patch for trunk cause the pom.xml structure changed.

          I moved TestByteArrayCuckooMap from src/main to src/test

          Show
          zhihyu@ebaysf.com Ted Yu added a comment - I didn't rebase Todd's patch for trunk cause the pom.xml structure changed. I moved TestByteArrayCuckooMap from src/main to src/test
          Hide
          tlipcon Todd Lipcon added a comment -

          Oh, I also did some hacking on the airplane: this WIP patch adds:

          • a cuckoo hashtable implementation which uses byte array ranges as keys. I needed this to avoid having to do copies into byte[] for use with a normal TreeMap. It could have been a normal separate-chaining Hashtable, but I've been wanting to implement a cuckoo hashmap for a while
          • imports JavaEWAH, a compressed bitmap index library
          • hooks into the HFile write path to write out bitmap index blocks with the hfile

          obviously far from anything useful, but posting here in case anyone's interested in running with it

          Show
          tlipcon Todd Lipcon added a comment - Oh, I also did some hacking on the airplane: this WIP patch adds: a cuckoo hashtable implementation which uses byte array ranges as keys. I needed this to avoid having to do copies into byte[] for use with a normal TreeMap. It could have been a normal separate-chaining Hashtable, but I've been wanting to implement a cuckoo hashmap for a while imports JavaEWAH, a compressed bitmap index library hooks into the HFile write path to write out bitmap index blocks with the hfile obviously far from anything useful, but posting here in case anyone's interested in running with it
          Hide
          tlipcon Todd Lipcon added a comment -

          I had some airplane time last week so I spent some of it thinking about how to go about this. It's not quite as straight-forward in HBase as it is in your standard RDBMS – at least on the read path:

          This is because a given row may actually span multiple "layers" of the stack of storefiles. For example, for a query "col_a = 1 AND col_b = 1", we may have the following files in a region:

          HFile 1: row1: col_a="1", no value for col_b
          HFile 2: row1: col_a="2" col_b="1"
          (so the "current" version of the row has col_a=1 and col_b=1

          So, if we naively apply the bitmap index to each HFile in turn, we would end up excluding the block in both, and not see the correct result.

          Instead, I think we have to apply each predicate to each HFile in turn to come up with a set of ranges:

          Predicate col_a = 1, HFile 1: output key range for block containing row1
          Predicate col_a = 1, HFile 2: no output

          Predicate col_b = 1, HFile 1: no output
          Predicate col_b = 1, Hfile 2: output key range for block containing row1

          For each predicate, we take the union of the key ranges across HFiles. Then, we take the intersection across predicates, to come up with a total set of applicable key ranges. Then, we can push this key range list down into the scanner stack to provide skip-ahead hints in the filter.

          Any interns out there interested in this?

          Show
          tlipcon Todd Lipcon added a comment - I had some airplane time last week so I spent some of it thinking about how to go about this. It's not quite as straight-forward in HBase as it is in your standard RDBMS – at least on the read path: This is because a given row may actually span multiple "layers" of the stack of storefiles. For example, for a query "col_a = 1 AND col_b = 1", we may have the following files in a region: HFile 1: row1: col_a="1", no value for col_b HFile 2: row1: col_a="2" col_b="1" (so the "current" version of the row has col_a=1 and col_b=1 So, if we naively apply the bitmap index to each HFile in turn, we would end up excluding the block in both, and not see the correct result. Instead, I think we have to apply each predicate to each HFile in turn to come up with a set of ranges: Predicate col_a = 1, HFile 1: output key range for block containing row1 Predicate col_a = 1, HFile 2: no output Predicate col_b = 1, HFile 1: no output Predicate col_b = 1, Hfile 2: output key range for block containing row1 For each predicate, we take the union of the key ranges across HFiles. Then, we take the intersection across predicates, to come up with a total set of applicable key ranges. Then, we can push this key range list down into the scanner stack to provide skip-ahead hints in the filter. Any interns out there interested in this?

            People

            • Assignee:
              Unassigned
              Reporter:
              tlipcon Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:

                Development