Index: src/docbkx/performance.xml =================================================================== --- src/docbkx/performance.xml (revision 1413288) +++ src/docbkx/performance.xml (working copy) @@ -208,38 +208,10 @@ -
- HDFS Configuration -
- Leveraging local data -Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via -HDFS-2246, -it is possible for the DFSClient to take a "short circuit" and -read directly from disk instead of going through the DataNode when the -data is local. What this means for HBase is that the RegionServers can -read directly off their machine's disks instead of having to open a -socket to talk to the DataNode, the former being generally much -fasterSee JD's Performance Talk. -Also see HBase, mail # dev - read short circuit thread for -more discussion around short circuit reads. - -To enable "short circuit" reads, you must set two configurations. -First, the hdfs-site.xml needs to be amended. Set -the property dfs.block.local-path-access.user -to be the only user that can use the shortcut. -This has to be the user that started HBase. Then in hbase-site.xml, -set dfs.client.read.shortcircuit to be true - - -The DataNodes need to be restarted in order to pick up the new -configuration. Be aware that if a process started under another -username than the one configured here also has the shortcircuit -enabled, it will get an Exception regarding an unauthorized access but -the data will still be read. - -
-
+ + +
ZooKeeper See for information on configuring ZooKeeper, and see the part @@ -658,6 +630,39 @@ Umbrella Jira Ticket for HDFS Improvements for HBase.
+
+ Leveraging local data +Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via +HDFS-2246, +it is possible for the DFSClient to take a "short circuit" and +read directly from disk instead of going through the DataNode when the +data is local. What this means for HBase is that the RegionServers can +read directly off their machine's disks instead of having to open a +socket to talk to the DataNode, the former being generally much +fasterSee JD's Performance Talk. +Also see HBase, mail # dev - read short circuit thread for +more discussion around short circuit reads. + +To enable "short circuit" reads, you must set two configurations. +First, the hdfs-site.xml needs to be amended. Set +the property dfs.block.local-path-access.user +to be the only user that can use the shortcut. +This has to be the user that started HBase. Then in hbase-site.xml, +set dfs.client.read.shortcircuit to be true + + + For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled. + To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into + its datablocks and verify against these. See . + + +The DataNodes need to be restarted in order to pick up the new +configuration. Be aware that if a process started under another +username than the one configured here also has the shortcircuit +enabled, it will get an Exception regarding an unauthorized access but +the data will still be read. + +
Performance Comparisons of HBase vs. HDFS A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,