Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6618

Implement FuzzyRowFilter with ranges support

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Later
    • None
    • None
    • Filters
    • None

    Description

      Apart from current ability to specify fuzzy row filter e.g. for <userId_actionId> format as ????_0004 (where 0004 - actionId) it would be great to also have ability to specify the "fuzzy range" , e.g. ????_0004, ..., ????_0099.

      See initial discussion here: http://search-hadoop.com/m/WVLJdX0Z65

      Note: currently it is possible to provide multiple fuzzy row rules to existing FuzzyRowFilter, but in case when the range is big (contains thousands of values) it is not efficient.

      Filter should perform efficient fast-forwarding during the scan (this is what distinguishes it from regex row filter).

      While such functionality may seem like a proper fit for custom filter (i.e. not including into standard filter set) it looks like the filter may be very re-useable. We may judge based on the implementation that will hopefully be added.

      Attachments

        1. HBASE-6618_5.patch
          119 kB
          Alex Baranau
        2. HBASE-6618_4.patch
          98 kB
          Alex Baranau
        3. HBASE-6618_3.path
          88 kB
          Alex Baranau
        4. HBASE-6618_2.path
          88 kB
          Alex Baranau
        5. HBASE-6618.patch
          88 kB
          Alex Baranau
        6. HBASE-6618-algo-desc-bits.png
          146 kB
          Alex Baranau
        7. HBASE-6618-algo.patch
          8 kB
          Alex Baranau

        Issue Links

          Activity

            People

              Unassigned Unassigned
              alexb Alex Baranau
              Votes:
              3 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: