[HBASE-22833] MultiRowRangeFilter should provide a method for creating a filter which is functionally equivalent to multiple prefix filters - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Wish
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.0.0-alpha-1
Fix Version/s: 3.0.0-alpha-1, 1.5.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6, 1.4.11
Component/s: Client
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
Provide a public method in MultiRowRangeFilter class to speed the requirement of filtering with multiple row prefixes, it will expand the row prefixes as multiple rowkey ranges by MultiRowRangeFilter, it's more efficient.
{code}
public MultiRowRangeFilter(byte[][] rowKeyPrefixes);
{code}

Show
Provide a public method in MultiRowRangeFilter class to speed the requirement of filtering with multiple row prefixes, it will expand the row prefixes as multiple rowkey ranges by MultiRowRangeFilter, it's more efficient. {code} public MultiRowRangeFilter(byte[][] rowKeyPrefixes); {code}

Description

HI,

I think current formal way to make multiple prefix filters is to create a FilterList and add PrefixFilter instances to the list:

FilterList allFilters = new FilterList(FilterList.Operator.MUST_PASS_ONE);
allFilters.addFilter(new PrefixFilter(Bytes.toBytes("123")));
allFilters.addFilter(new PrefixFilter(Bytes.toBytes("456")));
allFilters.addFilter(new PrefixFilter(Bytes.toBytes("678")));
scan.setFilter(allFilters);

(c.f., https://stackoverflow.com/questions/41074213/hbase-how-to-specify-multiple-prefix-filters-in-a-single-scan-operation )

However, in the case of creating a single prefix filter, HBase provides scan.setRowPrefixFilter method.
This method creates a range filter by setting a start row and a stop row.
The value of a stop row is decided by calling calculateTheClosestNextRowKeyForPrefix ( c.f., https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java#L574-L597 )

MultiRowRangeFilter could leverage a list of start row and stop row pairs and calculateTheClosestNextRowKeyForPrefix could compute the stop row value corresponding to given start row (i.e., a prefix).

I think this kind of filter (a filter which is functionally equivalent to multiple prefix filters) should be creatable by MultiRowRangeFilter and it's better than the current formal way.

Cheers,

Attachments

Issue Links

links to

GitHub Pull Request #493

Activity

People

Assignee:: Itsuki Toyota

Reporter:: Itsuki Toyota

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Aug/19 12:53

Updated:: 17/Aug/19 00:36

Resolved:: 16/Aug/19 03:54