[HBASE-20636] Introduce two bloom filter type : ROWPREFIX_FIXED_LENGTH and ROWPREFIX_DELIMITED - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha-1, 2.2.0
Component/s: HFile, regionserver, Scanners
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
Add two bloom filter type : ROWPREFIX_FIXED_LENGTH and ROWPREFIX_DELIMITED
1. ROWPREFIX_FIXED_LENGTH: specify the length of the prefix
2. ROWPREFIX_DELIMITED: specify the delimiter of the prefix
Need to specify parameters for these two types of bloomfilter, otherwise the table will fail to create
Example:
create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_FIXED_LENGTH', CONFIGURATION => {'RowPrefixBloomFilter.prefix_length' => '10'}}
create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_DELIMITED', CONFIGURATION => {'RowPrefixDelimitedBloomFilter.delimiter' => '#'}}

Show
Add two bloom filter type : ROWPREFIX_FIXED_LENGTH and ROWPREFIX_DELIMITED 1. ROWPREFIX_FIXED_LENGTH: specify the length of the prefix 2. ROWPREFIX_DELIMITED: specify the delimiter of the prefix Need to specify parameters for these two types of bloomfilter, otherwise the table will fail to create Example: create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_FIXED_LENGTH', CONFIGURATION => {'RowPrefixBloomFilter.prefix_length' => '10'}} create 't1', {NAME => 'f1', BLOOMFILTER => 'ROWPREFIX_DELIMITED', CONFIGURATION => {'RowPrefixDelimitedBloomFilter.delimiter' => '#'}}

Description

As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary files to improve read performance. But they only support Get and do not support Scan.

In our company(Tencent), many users need to scan all rows with the same prefix, such as Tencent Game. Game user's some operational record will be written into HBase, each game user will have a lot of records, the rowkey is constructed as userid+'#'+timestamps. So we can scan all records for a given user for a specified period.

For this scenario, we designed the prefix Bloom filter. If the startRow and stopRow of the Scan has a valid common prefix, the scan will be allowed to use BloomFilter to filter files which will enhance the performance of the scan.

Now, this feature has been running on our cluster over a year, and scan performance for this scenario has been improved by more than one times than before.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-20636.master.001.patch
24/May/18 11:10
58 kB
Guangxu Cheng
HBASE-20636.master.002.patch
24/May/18 16:54
59 kB
Guangxu Cheng
HBASE-20636.master.003.patch
29/May/18 12:13
65 kB
Guangxu Cheng
HBASE-20636.master.004.patch
21/Sep/18 00:04
65 kB
Andrew Kyle Purtell
HBASE-20636.master.005.patch
21/Sep/18 22:56
65 kB
Andrew Kyle Purtell

Issue Links

duplicates

HBASE-10342 RowKey Prefix Bloom Filter

Closed

is related to

HBASE-21520 TestMultiColumnScanner cost long time when using ROWCOL bloom type

Resolved

relates to

HBASE-21678 Port HBASE-20636 (Introduce two bloom filter type ROWPREFIX and ROWPREFIX_DELIMITED) to branch-1

Resolved

HBASE-21922 BloomContext#sanityCheck may failed when use ROWPREFIX_DELIMITED bloom filter

Resolved

links to

ReviewBoard(master)

Sub-Tasks

Port HBASE-20636 (Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED) to branch-1

Resolved

Andrew Kyle Purtell

Activity

People

Assignee:: Guangxu Cheng

Reporter:: Guangxu Cheng

Votes:: 0 Vote for this issue

Watchers:: 23 Start watching this issue

Dates

Created:: 24/May/18 11:04

Updated:: 25/Jul/20 19:56

Resolved:: 21/Sep/18 23:10