[HBASE-14397] PrefixFilter doesn't filter all remaining rows if the prefix is longer than rowkey being compared - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 1.3.0, 2.0.0
Component/s: Filters
Labels:
None

Hadoop Flags:

Reviewed

Description

The PrefixFilter will filter rowkey as:

  public boolean filterRowKey(Cell firstRowCell) {
    ...
    int length = firstRowCell.getRowLength();
    if (length < prefix.length) return true; // ===> return directly if the prefix is longer
    ....
    if ((!isReversed() && cmp > 0) || (isReversed() && cmp < 0)) {
      passedPrefix = true;
    }
    filterRow = (cmp != 0);
    return filterRow;
  }

If the prefix is longer than the current rowkey, PrefixFilter#filterRowKey will filter the rowkey directly without comparing, so that won't set 'passedPrefix' flag even the current row is larger than the prefix.
For example, if there are three rows 'a', 'b' and 'c' in the table, and we issue a scan request as:

hbase(main):001:0> scan 'test_table', {STARTROW => 'a', FILTER => "(PrefixFilter ('aa'))"}

The region server will check the three rows before returning. In our production, the user issue a scan with a PrefixFilter. The prefix is longer than the rowkeys of following millions of rows, so the region server will continue to check rows until hit a rowkey longer than the prefix. This make the client easily timeout. To fix this case, it seems we need to compare the prefix with the rowkey every serveral rows even when the prefix is longer.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-14397-trunk-v1.patch
11/Sep/15 01:39
1 kB
Jianwei Cui

Activity

People

Assignee:: Jianwei Cui

Reporter:: Jianwei Cui

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 10/Sep/15 13:01

Updated:: 01/Jul/22 21:23

Resolved:: 20/Jun/16 19:30