-
Type:
Improvement
-
Status: Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Component/s: Client, Performance, Scanners
-
Labels:None
-
Hadoop Flags:Reviewed
-
Release Note:Better perfomance for small scan (e.g. scan range is within one data block(64KB)) through setting 'small' attribute as true in Scan Object
review board:
https://reviews.apache.org/r/14059/
Performance Improvement
Test shows about 1.5~3X improvement for small scan where limit<=50 under cache hit ratio=100%.
See more performance test result from the picture attachment
Usage:
Scan scan = new Scan(startRow,stopRow);
scan.setSmall(true);
ResultScanner scanner = table.getScanner(scan);
Set the new 'small' attribute as true for scan object, others are the same
Now, one scan operation would call 3 RPC at least:
openScanner();
next();
closeScanner();
I think we could reduce the RPC call to one for small scan to get better performance
Also using pread is better than seek+read for small scan (For this point, see more on HBASE-7266)
Implements such a small scan as the patch, and take the performance test as following:
a.Environment:
patched on 0.94 version
one regionserver;
one client with 50 concurrent threads;
KV size:50/100;
100% LRU cache hit ratio;
Random start row of scan
b.Results:
See the picture attachment