Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
-
None
-
Reviewed
-
Description
We have the config 'hbase.storescanner.use.pread' at cluster level to set ReadType to be PRead if not explicitly specified in Scan object.
Same way we can have a way to make scan as STREAM type at cluster level (if not specified at Scan object level)
We do not need any new configs or so. We have the config 'hbase.storescanner.pread.max.bytes' which specifies when to switch read type to stream and it defaults to 4 * HFile block size. If one config this value as <= 0 means user need the switch when scanner is created itself. With such a handling we can support it.
So every scan need not set the read type.
The issue is in Cloud storage based system using Stream reads might be better. We introduced this PRead based scan with tests on HDFS based storage. In my customer case, Azure storage in place and WASB driver been used. We have a read ahead mechanism there (Read an entire Block of a blob in one REST call) and buffer that in WASB driver. This helps a lot wrt longer scans. Ya with config 'hbase.storescanner.pread.max.bytes' we can make the switch to happen early but better to go with 1.x way where the scan starts with Stream read itself.