diff --git src/main/docbkx/configuration.xml src/main/docbkx/configuration.xml
index 9a42c60..f9086b3 100644
--- src/main/docbkx/configuration.xml
+++ src/main/docbkx/configuration.xml
@@ -618,7 +618,9 @@ Index: pom.xml
instance of the Hadoop Distributed File System (HDFS).
Fully-distributed mode can ONLY run on HDFS. See the Hadoop
- requirements and instructions for how to set up HDFS.
+ requirements and instructions for how to set up HDFS for Hadoop 1.x. A good
+ walk-through for setting up HDFS on Hadoop 2 is at
+ http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide.
Below we describe the different distributed setups. Starting, verification and
exploration of your install, whether a pseudo-distributed or
@@ -628,130 +630,30 @@ Index: pom.xml
Pseudo-distributed
-
+
+ Pseudo-Distributed Quickstart
+ A quickstart has been added to the chapter. See . Some of the information that was originally in this
+ section has been moved there.
+
+
A pseudo-distributed mode is simply a fully-distributed mode run on a single host. Use
this configuration testing and prototyping on HBase. Do not use this configuration for
production nor for evaluating HBase performance.
- First, if you want to run on HDFS rather than on the local filesystem, setup your
- HDFS. You can set up HDFS also in pseudo-distributed mode (TODO: Add pointer to HOWTO doc;
- the hadoop site doesn't have any any more). Ensure you have a working HDFS before
- proceeding.
-
- Next, configure HBase. Edit conf/hbase-site.xml. This is the file
- into which you add local customizations and overrides. At a minimum, you must tell HBase
- to run in (pseudo-)distributed mode rather than in default standalone mode. To do this,
- set the hbase.cluster.distributed property to true (Its default is
- false). The absolute bare-minimum hbase-site.xml
- is therefore as follows:
-
-
- hbase.cluster.distributed
- true
-
-
-]]>
-
- With this configuration, HBase will start up an HBase Master process, a ZooKeeper
- server, and a RegionServer process running against the local filesystem writing to
- wherever your operating system stores temporary files into a directory named
- hbase-YOUR_USER_NAME.
-
- Such a setup, using the local filesystem and writing to the operating systems's
- temporary directory is an ephemeral setup; the Hadoop local filesystem -- which is what
- HBase uses when it is writing the local filesytem -- would lose data unless the system
- was shutdown properly in versions of HBase before 0.98.4 and 1.0.0 (see
- HBASE-11218 Data
- loss in HBase standalone mode). Writing to the operating
- system's temporary directory can also make for data loss when the machine is restarted as
- this directory is usually cleared on reboot. For a more permanent setup, see the next
- example where we make use of an instance of HDFS; HBase data will be written to the Hadoop
- distributed filesystem rather than to the local filesystem's tmp directory.
- In this conf/hbase-site.xml example, the
- hbase.rootdir property points to the local HDFS instance homed on the
- node h-24-30.example.com.
-
- Let HBase create ${hbase.rootdir}
- Let HBase create the hbase.rootdir directory. If you don't,
- you'll get warning saying HBase needs a migration run because the directory is missing
- files expected by HBase (it'll create them if you let it).
-
-
-<configuration>
- <property>
- <name>hbase.rootdir</name>
- <value>hdfs://h-24-30.sfo.stumble.net:8020/hbase</value>
- </property>
- <property>
- <name>hbase.cluster.distributed</name>
- <value>true</value>
- </property>
-</configuration>
-
-
- Now skip to for how to start and verify your pseudo-distributed install.
- See for notes on how to start extra Masters and RegionServers
- when running pseudo-distributed.
-
-
-
- Pseudo-distributed Extras
-
-
- Startup
- To start up the initial HBase cluster...
- % bin/start-hbase.sh
- To start up an extra backup master(s) on the same server run...
- % bin/local-master-backup.sh start 1
- ... the '1' means use ports 16001 & 16011, and this backup master's logfile
- will be at logs/hbase-${USER}-1-master-${HOSTNAME}.log.
- To startup multiple backup masters run...
- % bin/local-master-backup.sh start 2 3
- You can start up to 9 backup masters (10 total).
- To start up more regionservers...
- % bin/local-regionservers.sh start 1
- ... where '1' means use ports 16201 & 16301 and its logfile will be at
- `logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log.
- To add 4 more regionservers in addition to the one you just started by
- running...
- % bin/local-regionservers.sh start 2 3 4 5
- This supports up to 99 extra regionservers (100 total).
-
-
- Stop
- Assuming you want to stop master backup # 1, run...
- % cat /tmp/hbase-${USER}-1-master.pid |xargs kill -9
- Note that bin/local-master-backup.sh stop 1 will try to stop the cluster along
- with the master.
- To stop an individual regionserver, run...
- % bin/local-regionservers.sh stop 1
-
-
-
-
-
-
Fully-distributed
- For running a fully-distributed operation on more than one host, make the following
- configurations. In hbase-site.xml, add the property
- hbase.cluster.distributed and set it to true and
- point the HBase hbase.rootdir at the appropriate HDFS NameNode and
- location in HDFS where you would like HBase to write data. For example, if you namenode
- were running at namenode.example.org on port 8020 and you wanted to home your HBase in
- HDFS at /hbase, make the following configuration.
+ Typically, HBase is run in a fully-distributed mode, where the different daemons run
+ on multiple servers in the cluster. The hbase-cluster.distributed property is
+ set to true, just as in pseudo-distributed mode. The
+ hbase.rootdir is also typically set to an HDFS URI not hosted on the
+ localhost. Following is an example of this bare-bones distributed configuration.
@@ -774,25 +676,31 @@ Index: pom.xml
]]>
+
+
+ Distributed HBase Quickstart
+ See for a walk-through of a simple three-node
+ cluster configuration with multiple ZooKeeper, backup HMaster, and RegionServer
+ instances.
+
+
+
+ Distributed RegionServers
+ Typically, your cluster will contain multiple RegionServers all running on different
+ servers, as well as primary and backup Master and Zookeeper daemons. The
+ conf/regionservers file on the master server contains a list of hosts
+ whose RegionServers are associated with this cluster. Each host is on a separate line. All
+ hosts listed in this file will have their RegionServer processes started and stopped when
+ the master server starts or stops.
+
-
- regionservers
-
- In addition, a fully-distributed mode requires that you modify
- conf/regionservers. The file lists all hosts that you would have running
- HRegionServers, one host per line (This file in HBase is
- like the Hadoop slaves file). All servers listed in this file will
- be started and stopped when HBase cluster start or stop is run.
-
-
- ZooKeeper and HBaseSee section for ZooKeeper setup for HBase.
-
+
@@ -871,7 +779,7 @@ stopping hbase...............
of many machines. If you are running a distributed operation, be sure to wait until HBase
has shut down completely before stopping the Hadoop daemons.
-
+
diff --git src/main/docbkx/getting_started.xml src/main/docbkx/getting_started.xml
index 117e1ec..8499c22 100644
--- src/main/docbkx/getting_started.xml
+++ src/main/docbkx/getting_started.xml
@@ -40,46 +40,46 @@
- Quick Start
-
- This guide describes setup of a standalone HBase instance. It will run against the local
- filesystem. In later sections we will take you through how to run HBase on Apache Hadoop's
- HDFS, a distributed filesystem. This section shows you how to create a table in HBase,
- inserting rows into your new HBase table via the HBase shell, and then
- cleaning up and shutting down your standalone, local filesystem-based HBase instance. The
- below exercise should take no more than ten minutes (not including download time).
- Quick Start - Standalone HBase
+
+ This guide describes setup of a standalone HBase instance running against the local
+ filesystem. This is not an appropriate configuration for a production instance of HBase, but
+ will allow you to experiment with HBase. This section shows you how to create a table in
+ HBase using the hbase shell CLI, insert rows into the table, perform put
+ and scan operations against the table, enable or disable the table, and start and stop HBase.
+ Apart from downloading HBase, this procedure should take less than 10 minutes.
+ Local Filesystem and Durability
- Using HBase with a LocalFileSystem does not currently guarantee durability. The HDFS
- local filesystem implementation will lose edits if files are not properly closed -- which is
- very likely to happen when experimenting with a new download. You need to run HBase on HDFS
- to ensure all writes are preserved. Running against the local filesystem though will get you
- off the ground quickly and get you familiar with how the general system works so lets run
- with it for now. See Using HBase with a local filesystem does not guarantee durability. The HDFS
+ local filesystem implementation will lose edits if files are not properly closed. This is
+ very likely to happen when you are experimenting with new software, starting and stopping
+ the daemons often and not always cleanly. You need to run HBase on HDFS
+ to ensure all writes are preserved. Running against the local filesystem is intended as a
+ shortcut to get you familiar with how the general system works, as the very first phase of
+ evaluation. See and its associated issues
- for more details.
-
+ for more details about the issues of running on the local filesystem.
+
- Loopback IP
- The below advice is for hbase-0.94.x and older versions only. We believe this
- fixed in hbase-0.96.0 and beyond (let us know if we have it wrong). There
- should be no need of the below modification to /etc/hosts in later
- versions of HBase.
-
- HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some other
- distributions, for example, will default to 127.0.1.1 and this will cause problems for you
- See Why does
- HBase care about /etc/hosts? for detail.
- .
- /etc/hosts should look something like this:
-
+ Loopback IP - HBase 0.94.x and earlier
+ The below advice is for hbase-0.94.x and older versions only. This is fixed in
+ hbase-0.96.0 and beyond.
+
+ Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu
+ and some other distributions default to 127.0.1.1 and this will cause problems for you . See Why does HBase
+ care about /etc/hosts? for detail.
+
+ Example /etc/hosts File for Ubuntu
+ The following /etc/hosts file works correctly for HBase 0.94.x
+ and earlier, on Ubuntu. Use this as a template if you run into trouble.
+
127.0.0.1 localhost
127.0.0.1 ubuntu.ubuntu-domain ubuntu
-
-
+
+
@@ -89,154 +89,577 @@
- Download and unpack the latest stable release.
+ Get Started with HBase
- Choose a download site from this list of
+ Download, Configure, and Start HBase
+
+ Choose a download site from this list of Apache Download Mirrors.
Click on the suggested top link. This will take you to a mirror of HBase
Releases. Click on the folder named stable and then
- download the file that ends in .tar.gz to your local filesystem; e.g.
- hbase-0.94.2.tar.gz.
-
- Decompress and untar your download and then change into the unpacked directory.
-
- .tar.gz
-$ cd hbase-]]>
-
-
- At this point, you are ready to start HBase. But before starting it, edit
- conf/hbase-site.xml, the file you write your site-specific
- configurations into. Set hbase.rootdir, the directory HBase writes data
- to, and hbase.zookeeper.property.dataDir, the directory ZooKeeper writes
- its data too:
-
-
+ download the binary file that ends in .tar.gz to your local filesystem. Be
+ sure to choose the version that corresponds with the version of Hadoop you are likely to use
+ later. In most cases, you should choose the file for Hadoop 2, which will be called something
+ like hbase-0.98.3-hadoop2-bin.tar.gz. Do not download the file ending in
+ src.tar.gz for now.
+
+
+ Extract the downloaded file, and change to the newly-created directory.
+
+$ tar xzvf hbase-]]>-hadoop2-bin.tar.gz
+$ cd hbase-]]>-hadoop2/
+
+
+
+ Edit conf/hbase-site.xml, which is the main HBase configuration
+ file. At this time, you only need to specify the directory on the local filesystem where
+ HBase and Zookeeper write data. By default, a new directory is created under /tmp. Many
+ servers are configured to delete the contents of /tmp upon reboot, so you should store
+ the data elsewhere. The following configuration will store HBase's data in the
+ hbase directory, in the home directory of the user called
+ testuser. Paste the <property> tags beneath the
+ <configuration> tags, which should be empty in a new HBase install.
+
+ Example hbase-site.xml for Standalone HBase
+ hbase.rootdir
- file:///DIRECTORY/hbase
+ file:///home/testuser/hbasehbase.zookeeper.property.dataDir
- /DIRECTORY/zookeeper
+ /home/testuser/zookeeper
-]]>
- Replace DIRECTORY in the above with the path to the directory you
- would have HBase and ZooKeeper write their data. By default,
- hbase.rootdir is set to /tmp/hbase-${user.name}
- and similarly so for the default ZooKeeper data location which means you'll lose all your
- data whenever your server reboots unless you change it (Most operating systems clear
- /tmp on restart).
+
+ ]]>
+
+
+ You do not need to create the HBase data directory. HBase will do this for you. If
+ you create the directory, HBase will attempt to do a migration, which is not what you
+ want.
+
+
+ The bin/start-hbase.sh script is provided as a convenient way
+ to start HBase. Issue the command, and if all goes well, a message is logged to standard
+ output showing that HBase started successfully. You can use the jps
+ command to verify that you have one running process called HMaster
+ and at least one called HRegionServer.
+ Java needs to be installed and available. If you get an error indicating that
+ Java is not installed, but it is on your system, perhaps in a non-standard location,
+ edit the conf/hbase-env.sh file and modify the
+ JAVA_HOME setting to point to the directory that contains
+ bin/java your system.
+
+
+
+
+ Use HBase For the First Time
+
+ Connect to HBase.
+ Connect to your running instance of HBase using the hbase shell
+ command, located in the bin/ directory of your HBase
+ install. In this example, some usage and version information that is printed when you
+ start HBase Shell has been omitted. The HBase Shell prompt ends with a
+ > character.
+
+$ ./bin/hbase shell
+hbase(main):001:0>
+
+
+
+ Display HBase Shell Help Text.
+ Type help and press Enter, to display some basic usage
+ information for HBase Shell, as well as several example commands. Notice that table
+ names, rows, columns all must be enclosed in quote characters.
+
+
+ Create a table.
+ Use the create command to create a new table. You must specify the
+ table name and the ColumnFamily name.
+
+hbase> create 'test', 'cf'
+0 row(s) in 1.2200 seconds
+
+
+
+ List Information About your Table
+ Use the list command to
+
+hbase> list 'test'
+TABLE
+test
+1 row(s) in 0.0350 seconds
+
+=> ["test"]
+
+
+
+ Put data into your table.
+ To put data into your table, use the put command.
+
+hbase> put 'test', 'row1', 'cf:a', 'value1'
+0 row(s) in 0.1770 seconds
+
+hbase> put 'test', 'row2', 'cf:b', 'value2'
+0 row(s) in 0.0160 seconds
+
+hbase> put 'test', 'row3', 'cf:c', 'value3'
+0 row(s) in 0.0260 seconds
+
+ Here, we insert three values, one at a time. The first insert is at row1, column
+ cf:a, with a value of value1. Columns in HBase are comprised of a column family prefix,
+ cf in this example, followed by a colon and then a column qualifier suffix, a in this
+ case.
+
+
+ Scan the table for all data at once.
+ One of the ways to get data from HBase is to scan. Use the scan
+ command to scan the table for data. You can limit your scan, but for now, all data is
+ fetched.
+
+hbase> scan 'test'
+ROW COLUMN+CELL
+ row1 column=cf:a, timestamp=1403759475114, value=value1
+ row2 column=cf:b, timestamp=1403759492807, value=value2
+ row3 column=cf:c, timestamp=1403759503155, value=value3
+3 row(s) in 0.0440 seconds
+
+
+
+ Get a single row of data.
+ To get a single row of data at a time, use the get command.
+
+hbase> get 'test', 'row1'
+COLUMN CELL
+ cf:a timestamp=1403759475114, value=value1
+1 row(s) in 0.0230 seconds
+
+
+
+ Disable a table.
+ If you want to delete a table or change its settings, as well as in some other
+ situations, you need to disable the table first, using the disable
+ command. You can re-enable it using the enable command.
+
+hbase> disable 'test'
+0 row(s) in 1.6270 seconds
+
+hbase> enable 'test'
+0 row(s) in 0.4500 seconds
+
+
+
+ Drop the table.
+ To drop (delete) a table, use the drop command.
+
+hbase> drop 'test'
+0 row(s) in 0.2900 seconds
+
+
+
+ Exit the HBase Shell.
+ To exit the HBase Shell and disconnect from your cluster, use the
+ quit command. HBase is still running in the background.
+
+
+
+
+ Stop HBase
+
+ In the same way that the bin/start-hbase.sh script is provided
+ to conveniently start all HBase daemons, the bin/stop-hbase.sh
+ script stops them.
+
+$ ./bin/stop-hbase.sh
+stopping hbase....................
+$
+
+
+
+ After issuing the command, it can take several minutes for the processes to shut
+ down. Use the jps to be sure that the HMaster and HRegionServer
+ processes are shut down.
+
+
-
- Start HBase
-
- Now start HBase:
- $ ./bin/start-hbase.sh
-starting Master, logging to logs/hbase-user-master-example.org.out
-
- You should now have a running standalone HBase instance. In standalone mode, HBase runs
- all daemons in the the one JVM; i.e. both the HBase and ZooKeeper daemons. HBase logs can be
- found in the logs subdirectory. Check them out especially if it seems
- HBase had trouble starting.
-
+
+ Intermediate - Pseudo-Distributed Local Install
+ After working your way through , you can re-configure HBase
+ to run in pseudo-distributed mode. Pseudo-distributed mode means
+ that HBase still runs completely on a single host, but each HBase daemon (HMaster,
+ HRegionServer, and Zookeeper) runs as a separate process. By default, unless you configure the
+ hbase.rootdir property as described in , your data
+ is still stored in /tmp/. In this walk-through, we store your data in
+ HDFS instead, assuming you have HDFS available. You can skip the HDFS configuration to
+ continue storing your data in the local filesystem.
- Is java installed?
-
- All of the above presumes a 1.6 version of Oracle java is
- installed on your machine and available on your path (See ); i.e. when you type java, you see output
- that describes the options the java program takes (HBase requires java 6). If this is not
- the case, HBase will not start. Install java, edit conf/hbase-env.sh,
- uncommenting the JAVA_HOME line pointing it to your java install, then,
- retry the steps above.
+ Hadoop Configuration
+ This procedure assumes that you have configured Hadoop and HDFS on your local system
+ and or a remote system, and that they are running and available. It also assumes you are
+ using Hadoop 2. Currently, the documentation on the Hadoop website does not include a
+ quick start for Hadoop 2, but the guide at http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide
+ is a good starting point.
+
+
+ Stop HBase if it is running.
+ If you have just finished and HBase is still running,
+ stop it. This procedure will create a totally new directory where HBase will store its
+ data, so any databases you created before will be lost.
+
+
+ Configure HBase.
+
+ Edit the hbase-site.xml configuration. First, add the following
+ property, which directs HBase to run in distributed mode, with one JVM instance per
+ daemon.
+
+
+ hbase.cluster.distributed
+ true
+
+ ]]>
+ Next, change the hbase.rootdir from the local filesystem to the address
+ of your HDFS instance, using the hdfs://// URI syntax. In this example,
+ HDFS is running on the localhost at port 8020.
+
+ hbase.rootdir
+ hdfs://localhost:8020/hbase
+
+ ]]>
+
+ You do not need to create the directory in HDFS. HBase will do this for you. If you
+ create the directory, HBase will attempt to do a migration, which is not what you
+ want.
+
+
+ Start HBase.
+ Use the bin/start-hbase.sh command to start HBase. If your
+ system is configured correctly, the jps command should show the
+ HMaster and HRegionServer processes running.
+
+
+ Check the HBase directory in HDFS.
+ If everything worked correctly, HBase created its directory in HDFS. In the
+ configuration above, it is stored in /hbase/ on HDFS. You can use
+ the hadoop fs command in Hadoop's bin/ directory
+ to list this directory.
+
+$ ./bin/hadoop fs -ls /hbase
+Found 7 items
+drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
+drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
+drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
+drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
+-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
+-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
+drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs
+
+
+
+ Create a table and populate it with data.
+ You can use the HBase Shell to create a table, populate it with data, scan and get
+ values from it, using the same procedure as in .
+
+
+ Start and stop a backup HBase Master (HMaster) server.
+
+ Running multiple HMaster instances on the same hardware does not make sense in a
+ production environment, in the same way that running a pseudo-distributed cluster does
+ not make sense for production. This step is offered for testing and learning purposes
+ only.
+
+ The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster
+ servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster,
+ use the local-master-backup.sh. For each backup master you want to
+ start, add a parameter representing the port offset for that master. Each HMaster uses
+ two ports (16000 and 16010 by default). The port offset is added to these ports, so
+ using an offset of 2, the first backup HMaster would use ports 16002 and 16012. The
+ following command starts 3 backup servers using ports 16002/16012, 16003/16013, and
+ 16005/16015.
+
+$ ./bin/local-master-backup.sh 2 3 5
+
+ To kill a backup master without killing the entire cluster, you need to find its
+ process ID (PID). The PID is stored in a file with a name like
+ /tmp/hbase-USER-X-master.pid.
+ The only contents of the file are the PID. You can use the kill -9
+ command to kill that PID. The following command will kill the master with port offset 1,
+ but leave the cluster running:
+
+$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
+
+
+
+ Start and stop additional RegionServers
+ The HRegionServer manages the data in its StoreFiles as directed by the HMaster.
+ Generally, one HRegionServer runs per node in the cluster. Running multiple
+ HRegionServers on the same system can be useful for testing in pseudo-distributed mode.
+ The local-regionservers.sh command allows you to run multiple
+ RegionServers. It works in a similar way to the
+ local-master-backup.sh command, in that each parameter you provide
+ represents the port offset for an instance. Each RegionServer requires two ports, and
+ the default ports are 16200 and 16300. You can run 99 additional RegionServers, or 100
+ total, on a server. The following command starts four additional
+ RegionServers, running on sequential ports starting at 16202/16302.
+
+$ .bin/local-regionservers.sh start 2 3 4 5
+
+ To stop a RegionServer manually, use the local-regionservers.sh
+ with the stop parameter and the offset of the server to stop.
+ $ .bin/local-regionservers.sh stop 3
+
+
+ Stop HBase.
+ You can stop HBase the same way as in the other procedure, using the
+ bin/stop-hbase.sh command.
+
+
-
-
- Shell Exercises
-
- Connect to your running HBase via the shell.
-
- ' for list of supported commands.
-Type "exit" to leave the HBase Shell
-Version: 0.90.0, r1001068, Fri Sep 24 13:55:42 PDT 2010
-
-hbase(main):001:0>]]>
-
- Type help and then <RETURN> to see a listing
- of shell commands and options. Browse at least the paragraphs at the end of the help
- emission for the gist of how variables and command arguments are entered into the HBase
- shell; in particular note how table names, rows, and columns, etc., must be quoted.
-
- Create a table named test with a single column family named
- cf. Verify its creation by listing all tables and then insert some
- values.
-
- create 'test', 'cf'
-0 row(s) in 1.2200 seconds
-hbase(main):003:0> list 'test'
-..
-1 row(s) in 0.0550 seconds
-hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'
-0 row(s) in 0.0560 seconds
-hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
-0 row(s) in 0.0370 seconds
-hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
-0 row(s) in 0.0450 seconds]]>
-
- Above we inserted 3 values, one at a time. The first insert is at
- row1, column cf:a with a value of
- value1. Columns in HBase are comprised of a column family prefix --
- cf in this example -- followed by a colon and then a column qualifier
- suffix (a in this case).
-
- Verify the data insert by running a scan of the table as follows
-
- scan 'test'
-ROW COLUMN+CELL
-row1 column=cf:a, timestamp=1288380727188, value=value1
-row2 column=cf:b, timestamp=1288380738440, value=value2
-row3 column=cf:c, timestamp=1288380747365, value=value3
-3 row(s) in 0.0590 seconds]]>
-
- Get a single row
-
- get 'test', 'row1'
-COLUMN CELL
-cf:a timestamp=1288380727188, value=value1
-1 row(s) in 0.0400 seconds]]>
-
- Now, disable and drop your table. This will clean up all done above.
-
- h disable 'test'
-0 row(s) in 1.0930 seconds
-hbase(main):013:0> drop 'test'
-0 row(s) in 0.0770 seconds ]]>
-
- Exit the shell by typing exit.
-
- exit]]>
-
-
-
- Stopping HBase
-
- Stop your hbase instance by running the stop script.
-
- $ ./bin/stop-hbase.sh
-stopping hbase...............
+
+
+ Advanced - Fully Distributed
+ In reality, you need a fully-distributed configuration to fully test HBase and to use it
+ in real-world scenarios. In a distributed configuration, the cluster contains multiple
+ nodes, each of which runs one or more HBase daemon. These include primary and backup Master
+ instances, multiple Zookeeper nodes, and multiple RegionServer nodes.
+ This advanced quickstart adds two more nodes to your cluster. The architecture will be
+ as follows:
+
+ This quickstart assumes that each node is a virtual machine and that they are all on the
+ same network. It builds upon the previous quickstart, ,
+ assuming that the system you configured in that procedure is now node-a. Stop HBase on node-a
+ before continuing.
+
+ Be sure that all the nodes have full access to communicate, and that no firewall rules
+ are in place which could prevent them from talking to each other. If you see any errors like
+ no route to host, check your firewall.
+
+
+ Configure Password-Less SSH Access
+ node-a needs to be able to log into node-b and
+ node-c (and to itself) in order to start the daemons. The easiest way to accomplish this is
+ to use the same username on all hosts, and configure password-less SSH login from
+ node-a to each of the others.
+
+ On node-a, generate a key pair.
+ While logged in as the user who will run HBase, generate a SSH key pair, using the
+ following command:
+
+ $ ssh-keygen -t rsa
+ If the command succeeds, the location of the key pair is printed to standard output.
+ The default name of the public key is id_rsa.pub.
+
+
+ Create the directory that will hold the shared keys on the other nodes.
+ On node-b and node-c, log in as the HBase user and create
+ a .ssh/ directory in the user's home directory, if it does not
+ already exist. If it already exists, be aware that it may already contain other keys.
+
+
+ Copy the public key to the other nodes.
+ Securely copy the public key from node-a to each of the nodes, by
+ using the scp or some other secure means. On each of the other nodes,
+ create a new file called .ssh/authorized_keysif it does
+ not already exist, and append the contents of the
+ id_rsa.pub file to the end of it. Note that you also need to do
+ this for node-a itself.
+ $ cat id_rsa.pub >> ~/.ssh/authorized_keys
+
+
+ Test password-less login.
+ If you performed the procedure correctly, if you SSH from node-a to
+ either of the other nodes, using the same username, you should not be prompted for a password.
+
+
+
+ Since node-b will run a backup Master, repeat the procedure above,
+ substituting node-b everywhere you see node-a. Be sure not to
+ overwrite your existing .ssh/authorized_keys files, but concatenate
+ the new key onto the existing file using the >> operator rather than
+ the > operator.
+
+
+
+
+ Prepare node-a
+ node-a will run your primary master and ZooKeeper processes, but no RegionServers.
+
+ Stop the RegionServer from starting on node-a.
+ Edit conf/regionservers and remove the line which contains
+ localhost. Add lines with the hostnames or IP addresses for
+ node-b and
+ node-c. Save the file.
+
+
+ Configure HBase to use node-b as a backup master.
+ Create a new file in conf/ called
+ backup_masters, and add a new line to it with the hostname for
+ node-b. In this demonstration, the hostname is
+ node-b.example.com.
+
+
+ Configure ZooKeeper
+ In reality, you should carefully consider your ZooKeeper configuration. You can find
+ out more about configuring ZooKeeper in . For now, we want to configure
+ ZooKeeper to run on all three nodes.
+ On node-a, edit conf/hbase-site.xml and add the following
+ properties.
+
+ hbase.zookeeper.quorum
+ node-a.example.com,node-b.example.com,node-c.example.com
+
+
+ hbase.zookeeper.property.dataDir
+ /usr/local/zookeeper
+
+ ]]>
+
+
+ Everywhere in your configuration that you have referred to node-a as
+ localhost, change the reference to point to the hostname that
+ the other nodes will use to refer to node-a. In these examples, the
+ hostname is node-a.example.com.
+
+
+
+ Prepare node-b and node-c
+ node-b will run a backup master server and a ZooKeeper instance.
+
+ Download and unpack HBase.
+ Download and unpack HBase to node-b, just as you did for the standalone
+ and pseudo-distributed quickstarts.
+
+
+ Copy the configuration files from node-a to node-b.and
+ node-c.
+ Each node of your cluster needs to have the same configuration information. Copy the
+ contents of the conf/ directory to the conf/
+ directory on node-b.
+
+
+
+
+ Start and Test Your Cluster
+
+ Be sure HBase is not running on any node.
+ If you forgot to stop HBase from previous testing, you will have errors. Check to
+ see whether HBase is running on any of your nodes by using the jps
+ command. Look for the processes HMaster,
+ HRegionServer, and HQuorumPeer. If they exist,
+ kill them.
+
+
+ Start the cluster.
+ On node-a, issue the start-hbase.sh command. Your
+ output will be similar to that below.
+
+$ bin/start-hbase.sh
+node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
+node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
+node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
+starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
+node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
+node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
+node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out
+
+ ZooKeeper starts first, followed by the master, then the RegionServers.
+
+
+ Verify that the processes are running.
+ On each node of the cluster, run the jps command and verify that
+ the correct processes are running on each server. You may see additional Java processes
+ running on your servers as well, if they are used for other purposes.
+
+ node-ajps Output
+
+$ jps
+20355 Jps
+20071 HQuorumPeer
+20137 HMaster
+
+
+
+ node-bjps Output
+
+$ jps
+15930 HRegionServer
+16194 Jps
+15838 HQuorumPeer
+16010 HMaster
+
+
+
+ node-cjps Output
+
+$ jps
+13901 Jps
+13639 HQuorumPeer
+13737 HRegionServer
+
+
+
+
+ Browse to the Web UI.
+ If everything is set up correctly, you should be able to the UI for the Master
+ http://node-a.example.com:6110/ or the secondary master at
+ http://node-b.example.com:6110/ for the secondary master. You can see
+ the web UI for each of the RegionServers at port 6130 of their IP addresses, or by
+ clicking their links in the web UI for the Master.
+
+
+ Test what happens when nodes or services disappear.
+ With a three-node cluster like you have configured, things will not be very
+ resilient. Still, you can test what happens when the primary Master or a RegionServer
+ disappears, by killing the processes and watching the logs.
+
+
-
+
Where to go next
- The above described standalone setup is good for testing and experiments only. In the
+ In the
next chapter, , we'll go into depth on the different HBase run modes, system
requirements running HBase, and critical configurations setting up a distributed HBase