Index: src/docbkx/upgrading.xml =================================================================== --- src/docbkx/upgrading.xml (revision 1137243) +++ src/docbkx/upgrading.xml (working copy) @@ -9,7 +9,7 @@ xmlns:db="http://docbook.org/ns/docbook"> Upgrading - Review , in particular the section on Hadoop version. + Review , in particular the section on Hadoop version.
Upgrading to HBase 0.90.x from 0.20.x or 0.89.x Index: src/docbkx/book.xml =================================================================== --- src/docbkx/book.xml (revision 1137243) +++ src/docbkx/book.xml (working copy) @@ -64,8 +64,8 @@ + - Index: src/docbkx/getting_started.xml =================================================================== --- src/docbkx/getting_started.xml (revision 1137243) +++ src/docbkx/getting_started.xml (working copy) @@ -13,8 +13,8 @@ Introduction will get you up and - running on a single-node instance of HBase using the local filesystem. The - describes setup + running on a single-node instance of HBase using the local filesystem. + describes setup of HBase in distributed mode running on top of HDFS.
@@ -179,7 +179,7 @@ Where to go next The above described standalone setup is good for testing and - experiments only. Next move on to where we'll go into + experiments only. Next move on to where we'll go into depth on the different HBase run modes, requirements and critical configurations needed setting up a distributed HBase deploy. @@ -185,764 +185,4 @@ -
- Not-so-quick Start Guide - -
- Requirements - - HBase has the following requirements. Please read the section - below carefully and ensure that all requirements have been satisfied. - Failure to do so will cause you (and us) grief debugging strange errors - and/or data loss. - -
- java - - Just like Hadoop, HBase requires java 6 from Oracle. Usually - you'll want to use the latest version available except the problematic - u18 (u24 is the latest version as of this writing). -
- -
- <link - xlink:href="http://hadoop.apache.org">hadoop</link><indexterm> - <primary>Hadoop</primary> - </indexterm> - - - This version of HBase will only run on Hadoop - 0.20.x. It will not run on hadoop 0.21.x (nor 0.22.x). - HBase will lose data unless it is running on an HDFS that has a durable - sync. Hadoop 0.20.2 and Hadoop 0.20.203.0 DO NOT have this attribute. - Currently only the branch-0.20-append - branch has this attribute - See CHANGES.txt - in branch-0.20-append to see list of patches involved adding - append on the Hadoop 0.20 branch. - . No official releases have been made from the branch-0.20-append branch up - to now so you will have to build your own Hadoop from the tip of this - branch. Michael Noll has written a detailed blog, - Building - an Hadoop 0.20.x version for HBase 0.90.2, on how to build an - Hadoop from branch-0.20-append. Recommended. - - Or rather than build your own, you could use Cloudera's CDH3. CDH has - the 0.20-append patches needed to add a durable sync (CDH3 betas will - suffice; b2, b3, or b4). - - Because HBase depends on Hadoop, it bundles an instance of the - Hadoop jar under its lib directory. The bundled - Hadoop was made from the Apache branch-0.20-append branch at the time - of the HBase's release. The bundled jar is ONLY for use in standalone mode. - In distributed mode, it is critical that the version of Hadoop that is out - on your cluster match what is under HBase. Replace the hadoop jar found in the HBase - lib directory with the hadoop jar you are running on - your cluster to avoid version mismatch issues. Make sure you - replace the jar in HBase everywhere on your cluster. Hadoop version - mismatch issues have various manifestations but often all looks like - its hung up. - - - Hadoop Security - - HBase will run on any Hadoop 0.20.x that incorporates Hadoop - security features -- e.g. Y! 0.20S or CDH3B3 -- as long as you do as - suggested above and replace the Hadoop jar that ships with HBase - with the secure version. - -
- -
- ssh - - ssh must be installed and - sshd must be running to use Hadoop's scripts to - manage remote Hadoop and HBase daemons. You must be able to ssh to all - nodes, including your local node, using passwordless login (Google - "ssh passwordless login"). -
- -
- DNS - - HBase uses the local hostname to self-report it's IP address. - Both forward and reverse DNS resolving should work. - - If your machine has multiple interfaces, HBase will use the - interface that the primary hostname resolves to. - - If this is insufficient, you can set - hbase.regionserver.dns.interface to indicate the - primary interface. This only works if your cluster configuration is - consistent and every host has the same network interface - configuration. - - Another alternative is setting - hbase.regionserver.dns.nameserver to choose a - different nameserver than the system wide default. -
- -
- NTP - - The clocks on cluster members should be in basic alignments. - Some skew is tolerable but wild skew could generate odd behaviors. Run - NTP - on your cluster, or an equivalent. - - If you are having problems querying data, or "weird" cluster - operations, check system time! -
- -
- - <varname>ulimit</varname><indexterm> - <primary>ulimit</primary> - </indexterm> - and - <varname>nproc</varname><indexterm> - <primary>nproc</primary> - </indexterm> - - - HBase is a database. It uses a lot of files all at the same time. - The default ulimit -n -- i.e. user file limit -- of 1024 on most *nix systems - is insufficient (On mac os x its 256). Any significant amount of loading will - lead you to FAQ: Why do I - see "java.io.IOException...(Too many open files)" in my logs?. - You may also notice errors such as - 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException - 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 - Do yourself a favor and change the upper bound on the - number of file descriptors. Set it to north of 10k. See the above - referenced FAQ for how. You should also up the hbase users' - nproc setting; under load, a low-nproc - setting could manifest as OutOfMemoryError - See Jack Levin's major hdfs issues - note up on the user list. - The requirement that a database requires upping of system limits - is not peculiar to HBase. See for example the section - Setting Shell Limits for the Oracle User in - - Short Guide to install Oracle 10 on Linux.. - - - To be clear, upping the file descriptors and nproc for the user who is - running the HBase process is an operating system configuration, not an - HBase configuration. Also, a common mistake is that administrators - will up the file descriptors for a particular user but for whatever - reason, HBase will be running as some one else. HBase prints in its - logs as the first line the ulimit its seeing. Ensure its correct. - - A useful read setting config on you hadoop cluster is Aaron - Kimballs' Configuration - Parameters: What can you just ignore? - - -
- <varname>ulimit</varname> on Ubuntu - - If you are on Ubuntu you will need to make the following - changes: - - In the file /etc/security/limits.conf add - a line like: hadoop - nofile 32768 - Replace hadoop with whatever user is running - Hadoop and HBase. If you have separate users, you will need 2 - entries, one for each user. In the same file set nproc hard and soft - limits. For example: hadoop soft/hard nproc 32000. - - In the file /etc/pam.d/common-session add - as the last line in the file: session required pam_limits.so - Otherwise the changes in /etc/security/limits.conf won't be - applied. - - Don't forget to log out and back in again for the changes to - take effect! -
-
- -
- <varname>dfs.datanode.max.xcievers</varname><indexterm> - <primary>xcievers</primary> - </indexterm> - - An Hadoop HDFS datanode has an upper bound on the number of - files that it will serve at any one time. The upper bound parameter is - called xcievers (yes, this is misspelled). Again, - before doing any loading, make sure you have configured Hadoop's - conf/hdfs-site.xml setting the - xceivers value to at least the following: - - <property> - <name>dfs.datanode.max.xcievers</name> - <value>4096</value> - </property> - - - Be sure to restart your HDFS after making the above - configuration. - - Not having this configuration in place makes for strange looking - failures. Eventually you'll see a complain in the datanode logs - complaining about the xcievers exceeded, but on the run up to this one - manifestation is complaint about missing blocks. For example: - 10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block - blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: - java.io.IOException: No live nodes contain current block. Will get new - block locations from namenode and retry... - See Hadoop HDFS: Deceived by Xciever for an informative rant on xceivering. -
- -
- Windows - - HBase has been little tested running on windows. Running a - production install of HBase on top of windows is not - recommended. - - If you are running HBase on Windows, you must install Cygwin to have a *nix-like - environment for the shell scripts. The full details are explained in - the Windows - Installation guide. Also - search our user mailing list to pick - up latest fixes figured by windows users. -
-
- -
- HBase run modes: Standalone and Distributed - - HBase has two run modes: and . Out of the box, HBase runs in - standalone mode. To set up a distributed deploy, you will need to - configure HBase by editing files in the HBase conf - directory. - - Whatever your mode, you will need to edit - conf/hbase-env.sh to tell HBase which - java to use. In this file you set HBase environment - variables such as the heapsize and other options for the - JVM, the preferred location for log files, - etc. Set JAVA_HOME to point at the root of your - java install. - -
- Standalone HBase - - This is the default mode. Standalone mode is what is described - in the section. In - standalone mode, HBase does not use HDFS -- it uses the local - filesystem instead -- and it runs all HBase daemons and a local - ZooKeeper all up in the same JVM. Zookeeper binds to a well known port - so clients may talk to HBase. -
- -
- Distributed - - Distributed mode can be subdivided into distributed but all - daemons run on a single node -- a.k.a - pseudo-distributed-- and - fully-distributed where the daemons are spread - across all nodes in the cluster - The pseudo-distributed vs fully-distributed nomenclature - comes from Hadoop. - . - - Distributed modes require an instance of the Hadoop - Distributed File System (HDFS). See the Hadoop - requirements and instructions for how to set up a HDFS. Before - proceeding, ensure you have an appropriate, working HDFS. - - Below we describe the different distributed setups. Starting, - verification and exploration of your install, whether a - pseudo-distributed or - fully-distributed configuration is described in a - section that follows, . The same verification script applies to both - deploy types. - -
- Pseudo-distributed - - A pseudo-distributed mode is simply a distributed mode run on - a single host. Use this configuration testing and prototyping on - HBase. Do not use this configuration for production nor for - evaluating HBase performance. - - Once you have confirmed your HDFS setup, edit - conf/hbase-site.xml. This is the file into - which you add local customizations and overrides for - and . Point HBase at the running Hadoop HDFS - instance by setting the hbase.rootdir property. - This property points HBase at the Hadoop filesystem instance to use. - For example, adding the properties below to your - hbase-site.xml says that HBase should use the - /hbase directory in the HDFS whose namenode is - at port 9000 on your local machine, and that it should run with one - replica only (recommended for pseudo-distributed mode): - - -<configuration> - ... - <property> - <name>hbase.rootdir</name> - <value>hdfs://localhost:9000/hbase</value> - <description>The directory shared by RegionServers. - </description> - </property> - <property> - <name>dfs.replication</name> - <value>1</value> - <description>The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count. - </description> - </property> - ... -</configuration> - - - - Let HBase create the hbase.rootdir - directory. If you don't, you'll get warning saying HBase needs a - migration run because the directory is missing files expected by - HBase (it'll create them if you let it). - - - - Above we bind to localhost. This means - that a remote client cannot connect. Amend accordingly, if you - want to connect from a remote location. - - - Now skip to for how to start and verify your - pseudo-distributed install. - See Pseudo-distributed - mode extras for notes on how to start extra Masters and - RegionServers when running pseudo-distributed. - -
- -
- Fully-distributed - - For running a fully-distributed operation on more than one - host, make the following configurations. In - hbase-site.xml, add the property - hbase.cluster.distributed and set it to - true and point the HBase - hbase.rootdir at the appropriate HDFS NameNode - and location in HDFS where you would like HBase to write data. For - example, if you namenode were running at namenode.example.org on - port 9000 and you wanted to home your HBase in HDFS at - /hbase, make the following - configuration. - - -<configuration> - ... - <property> - <name>hbase.rootdir</name> - <value>hdfs://namenode.example.org:9000/hbase</value> - <description>The directory shared by RegionServers. - </description> - </property> - <property> - <name>hbase.cluster.distributed</name> - <value>true</value> - <description>The mode the cluster will be in. Possible values are - false: standalone and pseudo-distributed setups with managed Zookeeper - true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) - </description> - </property> - ... -</configuration> - - -
- <filename>regionservers</filename> - - In addition, a fully-distributed mode requires that you - modify conf/regionservers. The - file - lists all hosts that you would have running - HRegionServers, one host per line (This - file in HBase is like the Hadoop slaves - file). All servers listed in this file will be started and stopped - when HBase cluster start or stop is run. -
- -
- ZooKeeper<indexterm> - <primary>ZooKeeper</primary> - </indexterm> - - A distributed HBase depends on a running ZooKeeper cluster. - All participating nodes and clients need to be able to access the - running ZooKeeper ensemble. HBase by default manages a ZooKeeper - "cluster" for you. It will start and stop the ZooKeeper ensemble - as part of the HBase start/stop process. You can also manage the - ZooKeeper ensemble independent of HBase and just point HBase at - the cluster it should use. To toggle HBase management of - ZooKeeper, use the HBASE_MANAGES_ZK variable in - conf/hbase-env.sh. This variable, which - defaults to true, tells HBase whether to - start/stop the ZooKeeper ensemble servers as part of HBase - start/stop. - - When HBase manages the ZooKeeper ensemble, you can specify - ZooKeeper configuration using its native - zoo.cfg file, or, the easier option is to - just specify ZooKeeper options directly in - conf/hbase-site.xml. A ZooKeeper - configuration option can be set as a property in the HBase - hbase-site.xml XML configuration file by - prefacing the ZooKeeper option name with - hbase.zookeeper.property. For example, the - clientPort setting in ZooKeeper can be changed - by setting the - hbase.zookeeper.property.clientPort property. - For all default values used by HBase, including ZooKeeper - configuration, see . Look for the - hbase.zookeeper.property prefix - For the full list of ZooKeeper configurations, see - ZooKeeper's zoo.cfg. HBase does not ship - with a zoo.cfg so you will need to browse - the conf directory in an appropriate - ZooKeeper download. - - - You must at least list the ensemble servers in - hbase-site.xml using the - hbase.zookeeper.quorum property. This property - defaults to a single ensemble member at - localhost which is not suitable for a fully - distributed HBase. (It binds to the local machine only and remote - clients will not be able to connect). - How many ZooKeepers should I run? - - You can run a ZooKeeper ensemble that comprises 1 node - only but in production it is recommended that you run a - ZooKeeper ensemble of 3, 5 or 7 machines; the more members an - ensemble has, the more tolerant the ensemble is of host - failures. Also, run an odd number of machines. There can be no - quorum if the number of members is an even number. Give each - ZooKeeper server around 1GB of RAM, and if possible, its own - dedicated disk (A dedicated disk is the best thing you can do - to ensure a performant ZooKeeper ensemble). For very heavily - loaded clusters, run ZooKeeper servers on separate machines - from RegionServers (DataNodes and TaskTrackers). - - - For example, to have HBase manage a ZooKeeper quorum on - nodes rs{1,2,3,4,5}.example.com, bound to - port 2222 (the default is 2181) ensure - HBASE_MANAGE_ZK is commented out or set to - true in conf/hbase-env.sh - and then edit conf/hbase-site.xml and set - hbase.zookeeper.property.clientPort and - hbase.zookeeper.quorum. You should also set - hbase.zookeeper.property.dataDir to other than - the default as the default has ZooKeeper persist data under - /tmp which is often cleared on system - restart. In the example below we have ZooKeeper persist to - /user/local/zookeeper. - <configuration> - ... - <property> - <name>hbase.zookeeper.property.clientPort</name> - <value>2222</value> - <description>Property from ZooKeeper's config zoo.cfg. - The port at which the clients will connect. - </description> - </property> - <property> - <name>hbase.zookeeper.quorum</name> - <value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value> - <description>Comma separated list of servers in the ZooKeeper Quorum. - For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". - By default this is set to localhost for local and pseudo-distributed modes - of operation. For a fully-distributed setup, this should be set to a full - list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh - this is the list of servers which we will start/stop ZooKeeper on. - </description> - </property> - <property> - <name>hbase.zookeeper.property.dataDir</name> - <value>/usr/local/zookeeper</value> - <description>Property from ZooKeeper's config zoo.cfg. - The directory where the snapshot is stored. - </description> - </property> - ... - </configuration> - -
- Using existing ZooKeeper ensemble - - To point HBase at an existing ZooKeeper cluster, one that - is not managed by HBase, set HBASE_MANAGES_ZK - in conf/hbase-env.sh to false - - ... - # Tell HBase whether it should manage it's own instance of Zookeeper or not. - export HBASE_MANAGES_ZK=false Next set ensemble locations - and client port, if non-standard, in - hbase-site.xml, or add a suitably - configured zoo.cfg to HBase's - CLASSPATH. HBase will prefer the - configuration found in zoo.cfg over any - settings in hbase-site.xml. - - When HBase manages ZooKeeper, it will start/stop the - ZooKeeper servers as a part of the regular start/stop scripts. - If you would like to run ZooKeeper yourself, independent of - HBase start/stop, you would do the following - - -${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper - - - Note that you can use HBase in this manner to spin up a - ZooKeeper cluster, unrelated to HBase. Just make sure to set - HBASE_MANAGES_ZK to false - if you want it to stay up across HBase restarts so that when - HBase shuts down, it doesn't take ZooKeeper down with it. - - For more information about running a distinct ZooKeeper - cluster, see the ZooKeeper Getting - Started Guide. -
-
- -
- HDFS Client Configuration - - Of note, if you have made HDFS client - configuration on your Hadoop cluster -- i.e. - configuration you want HDFS clients to use as opposed to - server-side configurations -- HBase will not see this - configuration unless you do one of the following: - - - - Add a pointer to your HADOOP_CONF_DIR - to the HBASE_CLASSPATH environment variable - in hbase-env.sh. - - - - Add a copy of hdfs-site.xml (or - hadoop-site.xml) or, better, symlinks, - under ${HBASE_HOME}/conf, or - - - - if only a small set of HDFS client configurations, add - them to hbase-site.xml. - - - - An example of such an HDFS client configuration is - dfs.replication. If for example, you want to - run with a replication factor of 5, hbase will create files with - the default of 3 unless you do the above to make the configuration - available to HBase. -
-
-
- -
- Running and Confirming Your Installation - - - - Make sure HDFS is running first. Start and stop the Hadoop HDFS - daemons by running bin/start-hdfs.sh over in the - HADOOP_HOME directory. You can ensure it started - properly by testing the put and - get of files into the Hadoop filesystem. HBase does - not normally use the mapreduce daemons. These do not need to be - started. - - - - If you are managing your own ZooKeeper, - start it and confirm its running else, HBase will start up ZooKeeper - for you as part of its start process. - - - - Start HBase with the following command: - - - - bin/start-hbase.sh - - Run the above from the - - HBASE_HOME - - directory. - - You should now have a running HBase instance. HBase logs can be - found in the logs subdirectory. Check them out - especially if HBase had trouble starting. - - - - HBase also puts up a UI listing vital attributes. By default its - deployed on the Master host at port 60010 (HBase RegionServers listen - on port 60020 by default and put up an informational http server at - 60030). If the Master were running on a host named - master.example.org on the default port, to see the - Master's homepage you'd point your browser at - http://master.example.org:60010. - - - - Once HBase has started, see the for how to - create tables, add data, scan your insertions, and finally disable and - drop your tables. - - - - To stop HBase after exiting the HBase shell enter - $ ./bin/stop-hbase.sh -stopping hbase............... Shutdown can take a moment to - complete. It can take longer if your cluster is comprised of many - machines. If you are running a distributed operation, be sure to wait - until HBase has shut down completely before stopping the Hadoop - daemons. - - -
-
- -
- Example Configurations - -
- Basic Distributed HBase Install - - Here is an example basic configuration for a distributed ten - node cluster. The nodes are named example0, - example1, etc., through node - example9 in this example. The HBase Master and the - HDFS namenode are running on the node example0. - RegionServers run on nodes - example1-example9. A 3-node - ZooKeeper ensemble runs on example1, - example2, and example3 on the - default ports. ZooKeeper data is persisted to the directory - /export/zookeeper. Below we show what the main - configuration files -- hbase-site.xml, - regionservers, and - hbase-env.sh -- found in the HBase - conf directory might look like. - -
- <filename>hbase-site.xml</filename> - - - -<?xml version="1.0"?> -<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> -<configuration> - <property> - <name>hbase.zookeeper.quorum</name> - <value>example1,example2,example3</value> - <description>The directory shared by RegionServers. - </description> - </property> - <property> - <name>hbase.zookeeper.property.dataDir</name> - <value>/export/zookeeper</value> - <description>Property from ZooKeeper's config zoo.cfg. - The directory where the snapshot is stored. - </description> - </property> - <property> - <name>hbase.rootdir</name> - <value>hdfs://example0:9000/hbase</value> - <description>The directory shared by RegionServers. - </description> - </property> - <property> - <name>hbase.cluster.distributed</name> - <value>true</value> - <description>The mode the cluster will be in. Possible values are - false: standalone and pseudo-distributed setups with managed Zookeeper - true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) - </description> - </property> -</configuration> - - -
- -
- <filename>regionservers</filename> - - In this file you list the nodes that will run RegionServers. - In our case we run RegionServers on all but the head node - example1 which is carrying the HBase Master and - the HDFS namenode - - - example1 - example3 - example4 - example5 - example6 - example7 - example8 - example9 - -
- -
- <filename>hbase-env.sh</filename> - - Below we use a diff to show the differences - from default in the hbase-env.sh file. Here we - are setting the HBase heap to be 4G instead of the default - 1G. - - - -$ git diff hbase-env.sh -diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh -index e70ebc6..96f8c27 100644 ---- a/conf/hbase-env.sh -+++ b/conf/hbase-env.sh -@@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/ - # export HBASE_CLASSPATH= - - # The maximum amount of heap to use, in MB. Default is 1000. --# export HBASE_HEAPSIZE=1000 -+export HBASE_HEAPSIZE=4096 - - # Extra Java runtime options. - # Below are what we set by default. May only work with SUN JVM. - - - - Use rsync to copy the content of the - conf directory to all nodes of the - cluster. -
-
-
-
Index: src/docbkx/configuration.xml =================================================================== --- src/docbkx/configuration.xml (revision 1137243) +++ src/docbkx/configuration.xml (working copy) @@ -8,6 +8,11 @@ xmlns:html="http://www.w3.org/1999/xhtml" xmlns:db="http://docbook.org/ns/docbook"> Configuration + This chapter is the Not-So-Quick start guide to HBase configuration. + Please read this chapter carefully and ensure that all requirements have + been satisfied. Failure to do so will cause you (and us) grief debugging strange errors + and/or data loss. + HBase uses the same configuration system as Hadoop. To configure a deploy, edit a file of environment variables @@ -31,8 +36,654 @@ content of the conf directory to all nodes of the cluster. HBase will not do this for you. Use rsync. + +
+ Java + + Just like Hadoop, HBase requires java 6 from Oracle. Usually + you'll want to use the latest version available except the problematic + u18 (u24 is the latest version as of this writing). +
+
+ Operating System +
+ ssh + + ssh must be installed and + sshd must be running to use Hadoop's scripts to + manage remote Hadoop and HBase daemons. You must be able to ssh to all + nodes, including your local node, using passwordless login (Google + "ssh passwordless login"). +
+ +
+ DNS + + HBase uses the local hostname to self-report it's IP address. + Both forward and reverse DNS resolving should work. + + If your machine has multiple interfaces, HBase will use the + interface that the primary hostname resolves to. + + If this is insufficient, you can set + hbase.regionserver.dns.interface to indicate the + primary interface. This only works if your cluster configuration is + consistent and every host has the same network interface + configuration. + + Another alternative is setting + hbase.regionserver.dns.nameserver to choose a + different nameserver than the system wide default. +
+ +
+ NTP + + The clocks on cluster members should be in basic alignments. + Some skew is tolerable but wild skew could generate odd behaviors. Run + NTP + on your cluster, or an equivalent. + + If you are having problems querying data, or "weird" cluster + operations, check system time! +
+ +
+ + <varname>ulimit</varname><indexterm> + <primary>ulimit</primary> + </indexterm> + and + <varname>nproc</varname><indexterm> + <primary>nproc</primary> + </indexterm> + + + HBase is a database. It uses a lot of files all at the same time. + The default ulimit -n -- i.e. user file limit -- of 1024 on most *nix systems + is insufficient (On mac os x its 256). Any significant amount of loading will + lead you to FAQ: Why do I + see "java.io.IOException...(Too many open files)" in my logs?. + You may also notice errors such as + 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException + 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 + Do yourself a favor and change the upper bound on the + number of file descriptors. Set it to north of 10k. See the above + referenced FAQ for how. You should also up the hbase users' + nproc setting; under load, a low-nproc + setting could manifest as OutOfMemoryError + See Jack Levin's major hdfs issues + note up on the user list. + The requirement that a database requires upping of system limits + is not peculiar to HBase. See for example the section + Setting Shell Limits for the Oracle User in + + Short Guide to install Oracle 10 on Linux.. + + + To be clear, upping the file descriptors and nproc for the user who is + running the HBase process is an operating system configuration, not an + HBase configuration. Also, a common mistake is that administrators + will up the file descriptors for a particular user but for whatever + reason, HBase will be running as some one else. HBase prints in its + logs as the first line the ulimit its seeing. Ensure its correct. + + A useful read setting config on you hadoop cluster is Aaron + Kimballs' Configuration + Parameters: What can you just ignore? + + +
+ <varname>ulimit</varname> on Ubuntu + + If you are on Ubuntu you will need to make the following + changes: + + In the file /etc/security/limits.conf add + a line like: hadoop - nofile 32768 + Replace hadoop with whatever user is running + Hadoop and HBase. If you have separate users, you will need 2 + entries, one for each user. In the same file set nproc hard and soft + limits. For example: hadoop soft/hard nproc 32000. + In the file /etc/pam.d/common-session add + as the last line in the file: session required pam_limits.so + Otherwise the changes in /etc/security/limits.conf won't be + applied. + Don't forget to log out and back in again for the changes to + take effect! +
+
+ +
+ Windows + + HBase has been little tested running on Windows. Running a + production install of HBase on top of Windows is not + recommended. + + If you are running HBase on Windows, you must install Cygwin to have a *nix-like + environment for the shell scripts. The full details are explained in + the Windows + Installation guide. Also + search our user mailing list to pick + up latest fixes figured by Windows users. +
+ +
+ +
+ <link + xlink:href="http://hadoop.apache.org">Hadoop</link><indexterm> + <primary>Hadoop</primary> + </indexterm> + + + This version of HBase will only run on Hadoop + 0.20.x. It will not run on hadoop 0.21.x (nor 0.22.x). + HBase will lose data unless it is running on an HDFS that has a durable + sync. Hadoop 0.20.2 and Hadoop 0.20.203.0 DO NOT have this attribute. + Currently only the branch-0.20-append + branch has this attribute + See CHANGES.txt + in branch-0.20-append to see list of patches involved adding + append on the Hadoop 0.20 branch. + . No official releases have been made from the branch-0.20-append branch up + to now so you will have to build your own Hadoop from the tip of this + branch. Michael Noll has written a detailed blog, + Building + an Hadoop 0.20.x version for HBase 0.90.2, on how to build an + Hadoop from branch-0.20-append. Recommended. + + Or rather than build your own, you could use Cloudera's CDH3. CDH has + the 0.20-append patches needed to add a durable sync (CDH3 betas will + suffice; b2, b3, or b4). + + Because HBase depends on Hadoop, it bundles an instance of the + Hadoop jar under its lib directory. The bundled + Hadoop was made from the Apache branch-0.20-append branch at the time + of the HBase's release. The bundled jar is ONLY for use in standalone mode. + In distributed mode, it is critical that the version of Hadoop that is out + on your cluster match what is under HBase. Replace the hadoop jar found in the HBase + lib directory with the hadoop jar you are running on + your cluster to avoid version mismatch issues. Make sure you + replace the jar in HBase everywhere on your cluster. Hadoop version + mismatch issues have various manifestations but often all looks like + its hung up. + +
+ Hadoop Security + HBase will run on any Hadoop 0.20.x that incorporates Hadoop + security features -- e.g. Y! 0.20S or CDH3B3 -- as long as you do as + suggested above and replace the Hadoop jar that ships with HBase + with the secure version. +
+ +
+ <varname>dfs.datanode.max.xcievers</varname><indexterm> + <primary>xcievers</primary> + </indexterm> + + An Hadoop HDFS datanode has an upper bound on the number of + files that it will serve at any one time. The upper bound parameter is + called xcievers (yes, this is misspelled). Again, + before doing any loading, make sure you have configured Hadoop's + conf/hdfs-site.xml setting the + xceivers value to at least the following: + + <property> + <name>dfs.datanode.max.xcievers</name> + <value>4096</value> + </property> + + + Be sure to restart your HDFS after making the above + configuration. + + Not having this configuration in place makes for strange looking + failures. Eventually you'll see a complain in the datanode logs + complaining about the xcievers exceeded, but on the run up to this one + manifestation is complaint about missing blocks. For example: + 10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block + blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: + java.io.IOException: No live nodes contain current block. Will get new + block locations from namenode and retry... + See Hadoop HDFS: Deceived by Xciever for an informative rant on xceivering. +
+ +
+ +
+ HBase run modes: Standalone and Distributed + + HBase has two run modes: and . Out of the box, HBase runs in + standalone mode. To set up a distributed deploy, you will need to + configure HBase by editing files in the HBase conf + directory. + + Whatever your mode, you will need to edit + conf/hbase-env.sh to tell HBase which + java to use. In this file you set HBase environment + variables such as the heapsize and other options for the + JVM, the preferred location for log files, + etc. Set JAVA_HOME to point at the root of your + java install. + +
+ Standalone HBase + + This is the default mode. Standalone mode is what is described + in the section. In + standalone mode, HBase does not use HDFS -- it uses the local + filesystem instead -- and it runs all HBase daemons and a local + ZooKeeper all up in the same JVM. Zookeeper binds to a well known port + so clients may talk to HBase. +
+ +
+ Distributed + + Distributed mode can be subdivided into distributed but all + daemons run on a single node -- a.k.a + pseudo-distributed-- and + fully-distributed where the daemons are spread + across all nodes in the cluster + The pseudo-distributed vs fully-distributed nomenclature + comes from Hadoop. + . + + Distributed modes require an instance of the Hadoop + Distributed File System (HDFS). See the Hadoop + requirements and instructions for how to set up a HDFS. Before + proceeding, ensure you have an appropriate, working HDFS. + + Below we describe the different distributed setups. Starting, + verification and exploration of your install, whether a + pseudo-distributed or + fully-distributed configuration is described in a + section that follows, . The same verification script applies to both + deploy types. + +
+ Pseudo-distributed + + A pseudo-distributed mode is simply a distributed mode run on + a single host. Use this configuration testing and prototyping on + HBase. Do not use this configuration for production nor for + evaluating HBase performance. + + Once you have confirmed your HDFS setup, edit + conf/hbase-site.xml. This is the file into + which you add local customizations and overrides for + and . Point HBase at the running Hadoop HDFS + instance by setting the hbase.rootdir property. + This property points HBase at the Hadoop filesystem instance to use. + For example, adding the properties below to your + hbase-site.xml says that HBase should use the + /hbase directory in the HDFS whose namenode is + at port 9000 on your local machine, and that it should run with one + replica only (recommended for pseudo-distributed mode): + + +<configuration> + ... + <property> + <name>hbase.rootdir</name> + <value>hdfs://localhost:9000/hbase</value> + <description>The directory shared by RegionServers. + </description> + </property> + <property> + <name>dfs.replication</name> + <value>1</value> + <description>The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count. + </description> + </property> + ... +</configuration> + + + + Let HBase create the hbase.rootdir + directory. If you don't, you'll get warning saying HBase needs a + migration run because the directory is missing files expected by + HBase (it'll create them if you let it). + + + + Above we bind to localhost. This means + that a remote client cannot connect. Amend accordingly, if you + want to connect from a remote location. + + + Now skip to for how to start and verify your + pseudo-distributed install. + See Pseudo-distributed + mode extras for notes on how to start extra Masters and + RegionServers when running pseudo-distributed. + +
+ +
+ Fully-distributed + + For running a fully-distributed operation on more than one + host, make the following configurations. In + hbase-site.xml, add the property + hbase.cluster.distributed and set it to + true and point the HBase + hbase.rootdir at the appropriate HDFS NameNode + and location in HDFS where you would like HBase to write data. For + example, if you namenode were running at namenode.example.org on + port 9000 and you wanted to home your HBase in HDFS at + /hbase, make the following + configuration. + + +<configuration> + ... + <property> + <name>hbase.rootdir</name> + <value>hdfs://namenode.example.org:9000/hbase</value> + <description>The directory shared by RegionServers. + </description> + </property> + <property> + <name>hbase.cluster.distributed</name> + <value>true</value> + <description>The mode the cluster will be in. Possible values are + false: standalone and pseudo-distributed setups with managed Zookeeper + true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) + </description> + </property> + ... +</configuration> + + +
+ <filename>regionservers</filename> + + In addition, a fully-distributed mode requires that you + modify conf/regionservers. The + file + lists all hosts that you would have running + HRegionServers, one host per line (This + file in HBase is like the Hadoop slaves + file). All servers listed in this file will be started and stopped + when HBase cluster start or stop is run. +
+ +
+ ZooKeeper and HBase + See section for ZooKeeper setup for HBase. +
+ +
+ HDFS Client Configuration + + Of note, if you have made HDFS client + configuration on your Hadoop cluster -- i.e. + configuration you want HDFS clients to use as opposed to + server-side configurations -- HBase will not see this + configuration unless you do one of the following: + + + + Add a pointer to your HADOOP_CONF_DIR + to the HBASE_CLASSPATH environment variable + in hbase-env.sh. + + + + Add a copy of hdfs-site.xml (or + hadoop-site.xml) or, better, symlinks, + under ${HBASE_HOME}/conf, or + + + + if only a small set of HDFS client configurations, add + them to hbase-site.xml. + + + + An example of such an HDFS client configuration is + dfs.replication. If for example, you want to + run with a replication factor of 5, hbase will create files with + the default of 3 unless you do the above to make the configuration + available to HBase. +
+
+
+ +
+ Running and Confirming Your Installation + + + + Make sure HDFS is running first. Start and stop the Hadoop HDFS + daemons by running bin/start-hdfs.sh over in the + HADOOP_HOME directory. You can ensure it started + properly by testing the put and + get of files into the Hadoop filesystem. HBase does + not normally use the mapreduce daemons. These do not need to be + started. + + + + If you are managing your own ZooKeeper, + start it and confirm its running else, HBase will start up ZooKeeper + for you as part of its start process. + + + + Start HBase with the following command: + + + + bin/start-hbase.sh + + Run the above from the + + HBASE_HOME + + directory. + + You should now have a running HBase instance. HBase logs can be + found in the logs subdirectory. Check them out + especially if HBase had trouble starting. + + + + HBase also puts up a UI listing vital attributes. By default its + deployed on the Master host at port 60010 (HBase RegionServers listen + on port 60020 by default and put up an informational http server at + 60030). If the Master were running on a host named + master.example.org on the default port, to see the + Master's homepage you'd point your browser at + http://master.example.org:60010. + + + + Once HBase has started, see the for how to + create tables, add data, scan your insertions, and finally disable and + drop your tables. + + + + To stop HBase after exiting the HBase shell enter + $ ./bin/stop-hbase.sh +stopping hbase............... Shutdown can take a moment to + complete. It can take longer if your cluster is comprised of many + machines. If you are running a distributed operation, be sure to wait + until HBase has shut down completely before stopping the Hadoop + daemons. + + +
+
+ +
+ ZooKeeper<indexterm> + <primary>ZooKeeper</primary> + </indexterm> + + A distributed HBase depends on a running ZooKeeper cluster. + All participating nodes and clients need to be able to access the + running ZooKeeper ensemble. HBase by default manages a ZooKeeper + "cluster" for you. It will start and stop the ZooKeeper ensemble + as part of the HBase start/stop process. You can also manage the + ZooKeeper ensemble independent of HBase and just point HBase at + the cluster it should use. To toggle HBase management of + ZooKeeper, use the HBASE_MANAGES_ZK variable in + conf/hbase-env.sh. This variable, which + defaults to true, tells HBase whether to + start/stop the ZooKeeper ensemble servers as part of HBase + start/stop. + + When HBase manages the ZooKeeper ensemble, you can specify + ZooKeeper configuration using its native + zoo.cfg file, or, the easier option is to + just specify ZooKeeper options directly in + conf/hbase-site.xml. A ZooKeeper + configuration option can be set as a property in the HBase + hbase-site.xml XML configuration file by + prefacing the ZooKeeper option name with + hbase.zookeeper.property. For example, the + clientPort setting in ZooKeeper can be changed + by setting the + hbase.zookeeper.property.clientPort property. + For all default values used by HBase, including ZooKeeper + configuration, see . Look for the + hbase.zookeeper.property prefix + For the full list of ZooKeeper configurations, see + ZooKeeper's zoo.cfg. HBase does not ship + with a zoo.cfg so you will need to browse + the conf directory in an appropriate + ZooKeeper download. + + + You must at least list the ensemble servers in + hbase-site.xml using the + hbase.zookeeper.quorum property. This property + defaults to a single ensemble member at + localhost which is not suitable for a fully + distributed HBase. (It binds to the local machine only and remote + clients will not be able to connect). + How many ZooKeepers should I run? + + You can run a ZooKeeper ensemble that comprises 1 node + only but in production it is recommended that you run a + ZooKeeper ensemble of 3, 5 or 7 machines; the more members an + ensemble has, the more tolerant the ensemble is of host + failures. Also, run an odd number of machines. There can be no + quorum if the number of members is an even number. Give each + ZooKeeper server around 1GB of RAM, and if possible, its own + dedicated disk (A dedicated disk is the best thing you can do + to ensure a performant ZooKeeper ensemble). For very heavily + loaded clusters, run ZooKeeper servers on separate machines + from RegionServers (DataNodes and TaskTrackers). + + + For example, to have HBase manage a ZooKeeper quorum on + nodes rs{1,2,3,4,5}.example.com, bound to + port 2222 (the default is 2181) ensure + HBASE_MANAGE_ZK is commented out or set to + true in conf/hbase-env.sh + and then edit conf/hbase-site.xml and set + hbase.zookeeper.property.clientPort and + hbase.zookeeper.quorum. You should also set + hbase.zookeeper.property.dataDir to other than + the default as the default has ZooKeeper persist data under + /tmp which is often cleared on system + restart. In the example below we have ZooKeeper persist to + /user/local/zookeeper. + <configuration> + ... + <property> + <name>hbase.zookeeper.property.clientPort</name> + <value>2222</value> + <description>Property from ZooKeeper's config zoo.cfg. + The port at which the clients will connect. + </description> + </property> + <property> + <name>hbase.zookeeper.quorum</name> + <value>rs1.example.com,rs2.example.com,rs3.example.com,rs4.example.com,rs5.example.com</value> + <description>Comma separated list of servers in the ZooKeeper Quorum. + For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". + By default this is set to localhost for local and pseudo-distributed modes + of operation. For a fully-distributed setup, this should be set to a full + list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh + this is the list of servers which we will start/stop ZooKeeper on. + </description> + </property> + <property> + <name>hbase.zookeeper.property.dataDir</name> + <value>/usr/local/zookeeper</value> + <description>Property from ZooKeeper's config zoo.cfg. + The directory where the snapshot is stored. + </description> + </property> + ... + </configuration> + +
+ Using existing ZooKeeper ensemble + + To point HBase at an existing ZooKeeper cluster, one that + is not managed by HBase, set HBASE_MANAGES_ZK + in conf/hbase-env.sh to false + + ... + # Tell HBase whether it should manage it's own instance of Zookeeper or not. + export HBASE_MANAGES_ZK=false Next set ensemble locations + and client port, if non-standard, in + hbase-site.xml, or add a suitably + configured zoo.cfg to HBase's + CLASSPATH. HBase will prefer the + configuration found in zoo.cfg over any + settings in hbase-site.xml. + + When HBase manages ZooKeeper, it will start/stop the + ZooKeeper servers as a part of the regular start/stop scripts. + If you would like to run ZooKeeper yourself, independent of + HBase start/stop, you would do the following + + +${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper + + + Note that you can use HBase in this manner to spin up a + ZooKeeper cluster, unrelated to HBase. Just make sure to set + HBASE_MANAGES_ZK to false + if you want it to stay up across HBase restarts so that when + HBase shuts down, it doesn't take ZooKeeper down with it. + + For more information about running a distinct ZooKeeper + cluster, see the ZooKeeper Getting + Started Guide. Additionally, see the ZooKeeper Wiki or the + ZooKeeper documentation + for more information on ZooKeeper sizing. + +
+
+ + +
+ Configuration Files +
<filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename> Just as in Hadoop where you add site-specific HDFS configuration @@ -90,6 +741,183 @@
+
Client configuration and dependencies connecting to an HBase cluster + + Since the HBase Master may move around, clients bootstrap by looking to ZooKeeper for + current critical locations. ZooKeeper is where all these values are kept. Thus clients + require the location of the ZooKeeper ensemble information before they can do anything else. + Usually this the ensemble location is kept out in the hbase-site.xml and + is picked up by the client from the CLASSPATH. + + If you are configuring an IDE to run a HBase client, you should + include the conf/ directory on your classpath so + hbase-site.xml settings can be found (or + add src/test/resources to pick up the hbase-site.xml + used by tests). + + + Minimally, a client of HBase needs the hbase, hadoop, log4j, commons-logging, commons-lang, + and ZooKeeper jars in its CLASSPATH connecting to a cluster. + + + An example basic hbase-site.xml for client only + might look as follows: + + + + + hbase.zookeeper.quorum + example1,example2,example3 + The directory shared by region servers. + + + +]]> + + +
+ Java client configuration + The configuration used by a Java client is kept + in an HBaseConfiguration instance. + The factory method on HBaseConfiguration, HBaseConfiguration.create();, + on invocation, will read in the content of the first hbase-site.xml found on + the client's CLASSPATH, if one is present + (Invocation will also factor in any hbase-default.xml found; + an hbase-default.xml ships inside the hbase.X.X.X.jar). + It is also possible to specify configuration directly without having to read from a + hbase-site.xml. For example, to set the ZooKeeper + ensemble for the cluster programmatically do as follows: + Configuration config = HBaseConfiguration.create(); +config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally + If multiple ZooKeeper instances make up your ZooKeeper ensemble, + they may be specified in a comma-separated list (just as in the hbase-site.xml file). + This populated Configuration instance can then be passed to an + HTable, + and so on. + +
+
+ +
+ +
+ Example Configurations + +
+ Basic Distributed HBase Install + + Here is an example basic configuration for a distributed ten + node cluster. The nodes are named example0, + example1, etc., through node + example9 in this example. The HBase Master and the + HDFS namenode are running on the node example0. + RegionServers run on nodes + example1-example9. A 3-node + ZooKeeper ensemble runs on example1, + example2, and example3 on the + default ports. ZooKeeper data is persisted to the directory + /export/zookeeper. Below we show what the main + configuration files -- hbase-site.xml, + regionservers, and + hbase-env.sh -- found in the HBase + conf directory might look like. + +
+ <filename>hbase-site.xml</filename> + + + +<?xml version="1.0"?> +<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> +<configuration> + <property> + <name>hbase.zookeeper.quorum</name> + <value>example1,example2,example3</value> + <description>The directory shared by RegionServers. + </description> + </property> + <property> + <name>hbase.zookeeper.property.dataDir</name> + <value>/export/zookeeper</value> + <description>Property from ZooKeeper's config zoo.cfg. + The directory where the snapshot is stored. + </description> + </property> + <property> + <name>hbase.rootdir</name> + <value>hdfs://example0:9000/hbase</value> + <description>The directory shared by RegionServers. + </description> + </property> + <property> + <name>hbase.cluster.distributed</name> + <value>true</value> + <description>The mode the cluster will be in. Possible values are + false: standalone and pseudo-distributed setups with managed Zookeeper + true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) + </description> + </property> +</configuration> + + +
+ +
+ <filename>regionservers</filename> + + In this file you list the nodes that will run RegionServers. + In our case we run RegionServers on all but the head node + example1 which is carrying the HBase Master and + the HDFS namenode + + + example1 + example3 + example4 + example5 + example6 + example7 + example8 + example9 + +
+ +
+ <filename>hbase-env.sh</filename> + + Below we use a diff to show the differences + from default in the hbase-env.sh file. Here we + are setting the HBase heap to be 4G instead of the default + 1G. + + + +$ git diff hbase-env.sh +diff --git a/conf/hbase-env.sh b/conf/hbase-env.sh +index e70ebc6..96f8c27 100644 +--- a/conf/hbase-env.sh ++++ b/conf/hbase-env.sh +@@ -31,7 +31,7 @@ export JAVA_HOME=/usr/lib//jvm/java-6-sun/ + # export HBASE_CLASSPATH= + + # The maximum amount of heap to use, in MB. Default is 1000. +-# export HBASE_HEAPSIZE=1000 ++export HBASE_HEAPSIZE=4096 + + # Extra Java runtime options. + # Below are what we set by default. May only work with SUN JVM. + + + + Use rsync to copy the content of the + conf directory to all nodes of the + cluster. +
+
+
+ +
The Important Configurations Below we list what the important @@ -99,10 +927,7 @@
Required Configurations - See . - It lists at least two required configurations needed running HBase bearing - load: i.e. and - . + Review the and sections.
@@ -130,10 +955,7 @@
Number of ZooKeeper Instances - It is best to use an odd number of machines (1, 3, 5) for a ZooKeeper ensemble. - See the ZooKeeper Wiki or the - ZooKeeper documentation - for more information on ZooKeeper sizing. + See .
<varname>hbase.regionserver.handler.count</varname> @@ -264,63 +1086,5 @@
-
Client configuration and dependencies connecting to an HBase cluster - - - Since the HBase Master may move around, clients bootstrap by looking to ZooKeeper for - current critical locations. ZooKeeper is where all these values are kept. Thus clients - require the location of the ZooKeeper ensemble information before they can do anything else. - Usually this the ensemble location is kept out in the hbase-site.xml and - is picked up by the client from the CLASSPATH. - - If you are configuring an IDE to run a HBase client, you should - include the conf/ directory on your classpath so - hbase-site.xml settings can be found (or - add src/test/resources to pick up the hbase-site.xml - used by tests). - - - Minimally, a client of HBase needs the hbase, hadoop, log4j, commons-logging, commons-lang, - and ZooKeeper jars in its CLASSPATH connecting to a cluster. - - - An example basic hbase-site.xml for client only - might look as follows: - - - - - hbase.zookeeper.quorum - example1,example2,example3 - The directory shared by region servers. - - - -]]> - -
- Java client configuration - How Java reads hbase-site.xml content - The configuration used by a java client is kept - in an HBaseConfiguration instance. - The factory method on HBaseConfiguration, HBaseConfiguration.create();, - on invocation, will read in the content of the first hbase-site.xml found on - the client's CLASSPATH, if one is present - (Invocation will also factor in any hbase-default.xml found; - an hbase-default.xml ships inside the hbase.X.X.X.jar). - It is also possible to specify configuration directly without having to read from a - hbase-site.xml. For example, to set the ZooKeeper - ensemble for the cluster programmatically do as follows: - Configuration config = HBaseConfiguration.create(); -config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally - If multiple ZooKeeper instances make up your ZooKeeper ensemble, - they may be specified in a comma-separated list (just as in the hbase-site.xml file). - This populated Configuration instance can then be passed to an - HTable, - and so on. - -
-