Index: src/docbkx/configuration.xml =================================================================== --- src/docbkx/configuration.xml (revision 1446353) +++ src/docbkx/configuration.xml (working copy) @@ -221,18 +221,48 @@ xlink:href="http://hadoop.apache.org">Hadoop Hadoop - Please read all of this section - Please read this section to the end. Up front we - wade through the weeds of Hadoop versions. Later we talk of what you must do in HBase - to make it work w/ a particular Hadoop version. - + Selecting a Hadoop version is critical for your HBase deployment. Below table shows some information about what versions of Hadoop are supported by various HBase versions. Based on the version of HBase, you should select the most appropriate version of Hadoop. We are not in the Hadoop distro selection business. You can use Hadoop distributions from Apache, or learn about vendor distributions of Hadoop at + + + Hadoop version support matrix + + + HBase-0.92.xHBase-0.94.xHBase-0.96 + + Hadoop-0.20.205S S X + Hadoop-0.22.x S S X + Hadoop-1.0.x S S S + Hadoop-1.1.x NT S S + Hadoop-0.23.x X S NT + Hadoop-2.x X S S +
- - HBase will lose data unless it is running on an HDFS that has a durable - sync implementation. Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop 0.20.204.0 - DO NOT have this attribute. - Currently only Hadoop versions 0.20.205.x or any release in excess of this - version -- this includes hadoop 1.0.0 -- have a working, durable sync + Where + + S = supported and tested, + X = not supported, + NT = it should run, but not tested enough. + + + + Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its lib directory. The bundled jar is ONLY for use in standalone mode. In distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jar found in the HBase lib directory with the hadoop jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster. Hadoop version mismatch issues have various manifestations but often all looks like its hung up. + +
+ Apache HBase 0.92 and 0.94 + HBase 0.92 and 0.94 versions can work with Hadoop versions, 0.20.205, 0.22.x, 1.0.x, and 1.1.x. HBase-0.94 can additionally work with Hadoop-0.23.x and 2.x, but you may have to recompile the code using the specific maven profile (see top level pom.xml) +
+ +
+ Apache HBase 0.96 + Apache HBase 0.96.0 requires Apache Hadoop 1.x at a minimum, and it can run equally well on hadoop-2.0. + As of Apache HBase 0.96.x, Apache Hadoop 1.0.x at least is required. We will no longer run properly on older Hadoops such as 0.20.205 or branch-0.20-append. Do not move to Apache HBase 0.96.x if you cannot upgrade your HadoopSee HBase, mail # dev - DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?. +
+ +
+ Hadoop versions 0.20.x - 1.x + + HBase will lose data unless it is running on an HDFS that has a durable + sync implementation. DO NOT use Hadoop 0.20.2, Hadoop 0.20.203.0, and Hadoop 0.20.204.0 which DO NOT have this attribute. Currently only Hadoop versions 0.20.205.x or any release in excess of this version -- this includes hadoop-1.0.0 -- have a working, durable sync The Cloudera blog post An update on Apache Hadoop 1.0 by Charles Zedlweski has a nice exposition on how all the Hadoop versions relate. @@ -252,73 +282,13 @@ You will have to restart your cluster after making this edit. Ignore the chicken-little comment you'll find in the hdfs-default.xml in the - description for the dfs.support.append configuration; it says it is not enabled because there - are ... bugs in the 'append code' and is not supported in any production - cluster.. This comment is stale, from another era, and while I'm sure there - are bugs, the sync/append code has been running - in production at large scale deploys and is on - by default in the offerings of hadoop by commercial vendors - Until recently only the - branch-0.20-append - branch had a working sync but no official release was ever made from this branch. - You had to build it yourself. Michael Noll wrote a detailed blog, - Building - an Hadoop 0.20.x version for Apache HBase 0.90.2, on how to build an - Hadoop from branch-0.20-append. Recommended. - Praveen Kumar has written - a complimentary article, - Building Hadoop and HBase for HBase Maven application development. -Cloudera have dfs.support.append set to true by default.. - Please use the most up-to-date Hadoop possible. - Apache HBase 0.96.0 requires Apache Hadoop 1.0.0 at a minimum - As of Apache HBase 0.96.x, Apache Hadoop 1.0.x at least is required. We will no - longer run properly on older Hadoops such as 0.20.205 or branch-0.20-append. - Do not move to Apache HBase 0.96.x if you cannot upgrade your HadoopSee HBase, mail # dev - DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?. - Apache HBase 0.96.0 runs on Apache Hadoop 2.0. - - - -Or use the - Cloudera or - MapR distributions. - Cloudera' CDH3 - is Apache Hadoop 0.20.x plus patches including all of the - branch-0.20-append - additions needed to add a durable sync. Use the released, most recent version of CDH3. In CDH, append - support is enabled by default so you do not need to make the above mentioned edits to - hdfs-site.xml or to hbase-site.xml. - - MapR - includes a commercial, reimplementation of HDFS. - It has a durable sync as well as some other interesting features that are not - yet in Apache Hadoop. Their M3 - product is free to use and unlimited. - - - Because HBase depends on Hadoop, it bundles an instance of the - Hadoop jar under its lib directory. The bundled jar is ONLY for use in standalone mode. - In distributed mode, it is critical that the version of Hadoop that is out - on your cluster match what is under HBase. Replace the hadoop jar found in the HBase - lib directory with the hadoop jar you are running on - your cluster to avoid version mismatch issues. Make sure you - replace the jar in HBase everywhere on your cluster. Hadoop version - mismatch issues have various manifestations but often all looks like - its hung up. - Packaging and Apache BigTop - Apache Bigtop - is an umbrella for packaging and tests of the Apache Hadoop - ecosystem, including Apache HBase. Bigtop performs testing at various - levels (packaging, platform, runtime, upgrade, etc...), developed by a - community, with a focus on the system as a whole, rather than individual - projects. We recommend installing Apache HBase packages as provided by a - Bigtop release rather than rolling your own piecemeal integration of - various component releases. - - + description for the dfs.support.append configuration. + +
Apache HBase on Secure Hadoop Apache HBase will run on any Hadoop 0.20.x that incorporates Hadoop - security features -- e.g. Y! 0.20S or CDH3B3 -- as long as you do as + security features as long as you do as suggested above and replace the Hadoop jar that ships with HBase with the secure version. If you want to read more about how to setup Secure HBase, see .