Index: src/docbkx/performance.xml
===================================================================
--- src/docbkx/performance.xml (revision 1413288)
+++ src/docbkx/performance.xml (working copy)
@@ -208,38 +208,10 @@
-
- HDFS Configuration
-
- Leveraging local data
-Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
-HDFS-2246,
-it is possible for the DFSClient to take a "short circuit" and
-read directly from disk instead of going through the DataNode when the
-data is local. What this means for HBase is that the RegionServers can
-read directly off their machine's disks instead of having to open a
-socket to talk to the DataNode, the former being generally much
-fasterSee JD's Performance Talk.
-Also see HBase, mail # dev - read short circuit thread for
-more discussion around short circuit reads.
-
-To enable "short circuit" reads, you must set two configurations.
-First, the hdfs-site.xml needs to be amended. Set
-the property dfs.block.local-path-access.user
-to be the only user that can use the shortcut.
-This has to be the user that started HBase. Then in hbase-site.xml,
-set dfs.client.read.shortcircuit to be true
-
-
-The DataNodes need to be restarted in order to pick up the new
-configuration. Be aware that if a process started under another
-username than the one configured here also has the shortcircuit
-enabled, it will get an Exception regarding an unauthorized access but
-the data will still be read.
-
-
-
+
+
+
ZooKeeperSee for information on configuring ZooKeeper, and see the part
@@ -658,6 +630,39 @@
Umbrella Jira Ticket for HDFS Improvements for HBase.
+
+ Leveraging local data
+Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
+HDFS-2246,
+it is possible for the DFSClient to take a "short circuit" and
+read directly from disk instead of going through the DataNode when the
+data is local. What this means for HBase is that the RegionServers can
+read directly off their machine's disks instead of having to open a
+socket to talk to the DataNode, the former being generally much
+fasterSee JD's Performance Talk.
+Also see HBase, mail # dev - read short circuit thread for
+more discussion around short circuit reads.
+
+To enable "short circuit" reads, you must set two configurations.
+First, the hdfs-site.xml needs to be amended. Set
+the property dfs.block.local-path-access.user
+to be the only user that can use the shortcut.
+This has to be the user that started HBase. Then in hbase-site.xml,
+set dfs.client.read.shortcircuit to be true
+
+
+ For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled.
+ To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into
+ its datablocks and verify against these. See .
+
+
+The DataNodes need to be restarted in order to pick up the new
+configuration. Be aware that if a process started under another
+username than the one configured here also has the shortcircuit
+enabled, it will get an Exception regarding an unauthorized access but
+the data will still be read.
+
+ Performance Comparisons of HBase vs. HDFSA fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,