[HBASE-21734] Some optimization in FilterListWithOR - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha-1, 2.2.0, 1.4.10, 2.1.3, 2.0.5
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
After ~~HBASE-21620~~, the filterListWithOR has been a bit slow because we need to merge each sub-filter's RC , while before ~~HBASE-21620~~, we will skip many RC merging, but the logic was wrong. So here we choose another way to optimaze the performance: removing the KeyValueUtil#toNewKeyCell.
Anoop Sam John suggested that the KeyValueUtil#toNewKeyCell can save some GC before because if we copy key part of cell into a single byte[], then the block the cell refering won't be refered by the filter list any more, the upper layer can GC the data block quickly. while after ~~HBASE-21620~~, we will update the prevCellList for every encountered cell now, so the lifecycle of cell in prevCellList for FilterList will be quite shorter. so just use the cell ref for saving cpu.
BTW, we removed all the arrays streams usage in filter list, because it's also quite time-consuming in our test.

Show
After HBASE-21620 , the filterListWithOR has been a bit slow because we need to merge each sub-filter's RC , while before HBASE-21620 , we will skip many RC merging, but the logic was wrong. So here we choose another way to optimaze the performance: removing the KeyValueUtil#toNewKeyCell. Anoop Sam John suggested that the KeyValueUtil#toNewKeyCell can save some GC before because if we copy key part of cell into a single byte[], then the block the cell refering won't be refered by the filter list any more, the upper layer can GC the data block quickly. while after HBASE-21620 , we will update the prevCellList for every encountered cell now, so the lifecycle of cell in prevCellList for FilterList will be quite shorter. so just use the cell ref for saving cpu. BTW, we removed all the arrays streams usage in filter list, because it's also quite time-consuming in our test.

Description

In ~~HBASE-21620~~, KarthickRam and mohamed.meeran complaind that their performance of filter list has been degraded after that patch in here [1].

I wrote a UT for this, and test under my host. It's true. I gussed there may be two reasons:
1. the comparator.compare(nextKV, cell) > 0 StoreScanner;
2. the filter list concated by OR will choose the minimal forward step among all sub-filters. in this patch, we have stricter restrictions on all sub filters, include those sub-filter whose has non-null RC returned in calculateReturnCodeByPrevCellAndRC (previously, we will skip to merge this sub-filter's rc, but it's wrong in some case), and merge all of the sub-filter's RC, this is also some time cost.

The former one seems not the main problem, because the UT still cost ~ 3s even if I comment the compare. the second one has some impact indeed, because after i skip to merge the sub-filters's RC if calculateReturnCodeByPrevCellAndRC return a non-null rc, the UT cost ~ 1s, it's improvement but the logic is not wrong.

1. https://issues.apache.org/jira/browse/HBASE-21620?focusedCommentId=16737100&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16737100

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

columnkey.txt
17/Jan/19 13:42
15.11 MB
Zheng Hu
HBASE-21734.branch-1.v1.patch
18/Jan/19 14:19
1 kB
Zheng Hu
HBASE-21734.v1.patch
17/Jan/19 14:01
3 kB
Zheng Hu
perf-ut.patch
17/Jan/19 13:45
3 kB
Zheng Hu

Issue Links

links to

Review board

Activity

People

Assignee:: Zheng Hu

Reporter:: Zheng Hu

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 17/Jan/19 13:37

Updated:: 07/Oct/19 18:30

Resolved:: 22/Jan/19 04:11