[HBASE-14221] Reduce the number of time row comparison is done in a Scan - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.2.0, 1.3.0, 1.0.3, 1.1.3, 2.0.0
Component/s: Scanners
Labels:
None

Hadoop Flags:

Reviewed

Description

When we tried to do some profiling with the PE tool found this.
Currently we do row comparisons in 3 places in a simple Scan case.
1) ScanQueryMatcher

       int ret = this.rowComparator.compareRows(curCell, cell);
    if (!this.isReversed) {
      if (ret <= -1) {
        return MatchCode.DONE;
      } else if (ret >= 1) {
        // could optimize this, if necessary?
        // Could also be called SEEK_TO_CURRENT_ROW, but this
        // should be rare/never happens.
        return MatchCode.SEEK_NEXT_ROW;
      }
    } else {
      if (ret <= -1) {
        return MatchCode.SEEK_NEXT_ROW;
      } else if (ret >= 1) {
        return MatchCode.DONE;
      }
    }

2) In StoreScanner next() while starting to scan the row

    if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || matcher.curCell == null ||
        isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
      this.countPerRow = 0;
      matcher.setToNewRow(peeked);
    }

Particularly to see if we are in a new row.
3) In HRegion

          scannerContext.setKeepProgress(true);
          heap.next(results, scannerContext);
          scannerContext.setKeepProgress(tmpKeepProgress);

          nextKv = heap.peek();
moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);

Here again there are cases where we need to careful for a MultiCF case. Was trying to solve this for the MultiCF case but is having lot of cases to solve. But atleast for a single CF case I think these comparison can be reduced.
So for a single CF case in the SQM we are able to find if we have crossed a row using the code pasted above in SQM. That comparison is definitely needed.
Now in case of a single CF the HRegion is going to have only one element in the heap and so the 3rd comparison can surely be avoided if the StoreScanner.next() was over due to MatchCode.DONE caused by SQM.

Coming to the 2nd compareRows that we do in StoreScanner. next() - even that can be avoided if we know that the previous next() call was over due to a new row. Doing all this I found that the compareRows in the profiler which was 19% got reduced to 13%. Initially we can solve for single CF case which can be extended to MultiCF cases.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

withoutmatchingRowspatch.png
14/Aug/15 12:53
36 kB
ramkrishna.s.vasudevan
withmatchingRowspatch.png
14/Aug/15 12:53
23 kB
ramkrishna.s.vasudevan
HBASE-14221-branch-1.patch
06/Jan/16 08:46
4 kB
ramkrishna.s.vasudevan
HBASE-14221.patch
14/Aug/15 12:51
17 kB
ramkrishna.s.vasudevan
HBASE-14221_9.patch
05/Jan/16 08:28
5 kB
ramkrishna.s.vasudevan
HBASE-14221_6.patch
29/Sep/15 10:35
22 kB
ramkrishna.s.vasudevan
HBASE-14221_1.patch
17/Aug/15 11:27
19 kB
ramkrishna.s.vasudevan
HBASE-14221_1.patch
17/Aug/15 13:00
19 kB
ramkrishna.s.vasudevan
14221-0.98-takeALook.txt
07/Oct/15 06:06
4 kB
Lars Hofhansl

Activity

People

Assignee:: ramkrishna.s.vasudevan

Reporter:: ramkrishna.s.vasudevan

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 14/Aug/15 05:29

Updated:: 27/Jan/16 15:28

Resolved:: 19/Jan/16 05:22