Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
0.94.2
-
None
-
Reviewed
Description
When short-circuit reads are enabled (dfs.client.read.shortcircuit = true), reading with checksums enabled (dfs.client.read.shortcircuit.skip.checksum = false) follows a completely different, and much slower, path to reading with checksums disabled (BlockReaderLocal uses something called a "slow buffer", which is, unsurprisingly, slow). My tests show that this path is actually slower than having short-circuit reads disabled.
Therefore, I think section 11.5.1 of the HBase documentation should recommend that hbase.regionserver.checksum.verify be set to true when using short-circuit reads. I'd suggest the following:
"For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled. To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into its datablocks and verify against these. See Section 11.4.9. "hbase.regionserver.checksum.verify".