We observed “responseTooSlow” logs for Get requests in our production clusters. even some get requests were responded after 10 seconds.
Affected get requests were done with the timerange, and target rows have many columns that have some versions.
We reproduced this issue, and found this behavior happens only when scanning in the memstore. after flushing the HStore, this slow response issue for Get disappeared and all same get requests are responded very quickly.
We investigated this case, and found this performance difference between memstore scanner and hfile scanner is caused by the number of reseek operations executed while scanning. When a store scanner needs to reseek the next column, Hfile scanner wisely decide whether it have to reseek or not by checking the seek point is in current block, whereas memstore scanner just do reseek without decision unlike Hfile scanner. In our case, almost all columns in the memstore have older timestamp than scan(get)’s timerange, and so many reseek operations occur as much as about the number of columns. This results in increasing the response time of Get requests sporadically.
To improve the reseek operation of the memstore scanner, i think it’s better skipping than seeking when reseek requested, if seek point is quite close to current cell that the scanner is pointing now.(Actually, i changed MatchCode.SEEK_NEXT_COL to MatchCode.Skip in our case, and the response time of Get was 6x faster than before) But we can’t decide whether seek point is close to the current cell or not, because memstore scannner has no information such as next block index.
HBASE-13109, Scan.HINT_LOOKAHEAD was introduced to handle like this case, and it may be deprecated someday. But, i think that hint is still be useful for the memstore scanner to try to skip first, before reseeking, and with this option we can make reseek operations of memstore scanner smarter.
I tested this patch in our case, and got the same result as i changed matchcode (mentioned above).