Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When designing HBase schema for some use cases, it is quite common to combine multiple information within the RowKey. For instance, assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys which starting by id1. In such case, the rowkey bloom filter is able to cut more unnecessary seeks during the scan.

        Activity

        Liyin Tang created issue -
        Hide
        Liyin Tang added a comment -

        This feature shall also benefit the Salted Tables as well.

        Show
        Liyin Tang added a comment - This feature shall also benefit the Salted Tables as well.
        Hide
        Liang Xie added a comment -

        i thought it several weeks before as well after read the rocksdb's doc, good stuff, liyin!
        Another issue we could file combined with current one is to support pluggable memstore impl, such that we could introduce a prefix-hash memstore, it'll more efficient under scan + prefix filter scenario.

        Show
        Liang Xie added a comment - i thought it several weeks before as well after read the rocksdb's doc, good stuff, liyin! Another issue we could file combined with current one is to support pluggable memstore impl, such that we could introduce a prefix-hash memstore, it'll more efficient under scan + prefix filter scenario.
        Hide
        Liyin Tang added a comment -

        Yes, a prefix-hash memstore will help this case as well ! It is definitely worth benchmarking.

        Show
        Liyin Tang added a comment - Yes, a prefix-hash memstore will help this case as well ! It is definitely worth benchmarking.
        Liyin Tang made changes -
        Field Original Value New Value
        Description When designing HBase schema for some use cases, it is quite common to combine multiple information within the RowKey. For instance, assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys which starting at id1 . In such case, the rowkey bloom filter is able to cut more unnecessary seeks during the scan. When designing HBase schema for some use cases, it is quite common to combine multiple information within the RowKey. For instance, assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys which starting by id1. In such case, the rowkey bloom filter is able to cut more unnecessary seeks during the scan.
        Hide
        Manukranth Kolloju added a comment -

        Liang Xie, interesting idea. Liyin Tang, how will the prefix based bloom filter help in salted tables when the salting criteria is not related to the most significant part of the row key?

        Show
        Manukranth Kolloju added a comment - Liang Xie , interesting idea. Liyin Tang , how will the prefix based bloom filter help in salted tables when the salting criteria is not related to the most significant part of the row key?
        Hide
        Liyin Tang added a comment -

        Interesting.... If the most significant part of the row key is evenly distributed across the row key space, usually we don't need to salt the table, right ?

        Show
        Liyin Tang added a comment - Interesting.... If the most significant part of the row key is evenly distributed across the row key space, usually we don't need to salt the table, right ?

          People

          • Assignee:
            Unassigned
            Reporter:
            Liyin Tang
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:

              Development