[HBASE-9428] Regex filters are at least an order of magnitude slower since 0.94.3 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.98.0, 0.94.12, 0.96.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

I found this issue after debugging a performance problem on an OpenTSDB cluster, it was basically unusable after an upgrade from 0.94.2 to 0.94.6. It was caused by ~~HBASE-7279~~ (ping lhofhansl).

The easiest way to see it is to run a simple 1 client PE:

$ ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1

Then in the shell do a filter scan (flush the table first and make sure if fits in your blockcache if you want stable numbers).

Pre ~~HBASE-7279~~:

hbase(main):028:0> scan 'TestTable', {FILTER => "(RowFilter (=, 'regexstring:0000055872') )"}
ROW                                                 COLUMN+CELL                                                                                                                                         
 0000055872                                         column=info:data, timestamp=1378248850191, value=(blanked)                                                                                                                                    
1 row(s) in 1.2780 seconds

Post ~~HBASE-7279~~

hbase(main):037:0* scan 'TestTable', {FILTER => "(RowFilter (=, 'regexstring:0000055872') )"}
ROW                                                 COLUMN+CELL                                                                                                                                         
 0000055872                                         column=info:data, timestamp=1378248850191, value=(blanked)                                                                                                                                      
1 row(s) in 24.2940 seconds

I tried a bunch of 0.94, up to 0.94.11, and the tip of 0.96. They are all slow like this.

It seems that since that jira went in we do a lot more row matching, and running the regex gets super expensive.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

9428-0.94.txt
04/Sep/13 17:40
1 kB
Lars Hofhansl
9428-trunk.txt
04/Sep/13 17:56
1 kB
Lars Hofhansl

Sub-Tasks

Improve HBASE-9428 - avoid copying bytes for RegexFilter unless necessary

Closed

Lars Hofhansl

Activity

People

Assignee:: Lars Hofhansl

Reporter:: Jean-Daniel Cryans

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 04/Sep/13 00:49

Updated:: 08/Oct/13 02:41

Resolved:: 04/Sep/13 18:46