Index: src/java/overview.html =================================================================== --- src/java/overview.html (revision 818531) +++ src/java/overview.html (working copy) @@ -22,120 +22,117 @@
dfs.datanode.max.xcievers).
- Default is 256. Up this limit on your hadoop cluster.
- dfs.datanode.max.xcievers).
+ Default is 256. Up this limit on your hadoop cluster.
+ C:\cygwin you
-should modify the following appropriately.
--
--For additional information, see the -Hadoop Quick Start Guide - --HOME=c:\cygwin\home\jim -ANT_HOME=(wherever you installed ant) -JAVA_HOME=(wherever you installed java) -PATH=C:\cygwin\bin;%JAVA_HOME%\bin;%ANT_HOME%\bin; other windows stuff -SHELL=/bin/bash --
-What follows presumes you have obtained a copy of HBase, -see Releases, and are installing -for the first time. If upgrading your -HBase instance, see Upgrading. -
-Three modes are described: standalone, pseudo-distributed (where all servers are run on -a single host), and distributed. If new to hbase start by following the standalone instruction. -
-
-Whatever your mode, define ${HBASE_HOME} to be the location of the root of your HBase installation, e.g.
-/user/local/hbase. Edit ${HBASE_HOME}/conf/hbase-env.sh. In this file you can
-set the heapsize for HBase, etc. At a minimum, set JAVA_HOME to point at the root of
-your Java installation.
+If you are running HBase on Windows, you must install
+Cygwin
+to have a *nix-like environment for the shell scripts. The full details
+are explained in
+the Windows Installation
+guide.
+
What follows presumes you have obtained a copy of HBase, see Releases, and
+are installing for the first time.
+If upgrading your HBase instance, see Upgrading.
Three modes are described: standalone, pseudo-distributed
+(where all servers are run on a single host), and distributed.
+If new to hbase start by following the standalone instruction.
Whatever your mode, define ${HBASE_HOME} to be the
+location of the root of your HBase installation, e.g. /user/local/hbase.
+Edit ${HBASE_HOME}/conf/hbase-env.sh. In this file you can
+set the heapsize for HBase, etc. At a minimum, set JAVA_HOME
+to point at the root of your Java installation.
-If you are running a standalone operation, there should be nothing further to configure; proceed to -Running and Confirming Your Installation. If you are running a distributed -operation, continue reading. -
+If you are running a standalone operation, there should be +nothing further to configure; proceed to Running +and Confirming Your Installation.
-Distributed mode requires an instance of the Hadoop Distributed File System (DFS). -See the Hadoop -requirements and instructions for how to set up a DFS. -
+Distributed mode requires an instance of the Hadoop Distributed +File System (DFS). See the Hadoop +requirements and instructions for how to set up a DFS.
A pseudo-distributed operation is simply a distributed operation run on a single host.
-Once you have confirmed your DFS setup, configuring HBase for use on one host requires modification of
-${HBASE_HOME}/conf/hbase-site.xml, which needs to be pointed at the running Hadoop DFS instance.
-Use hbase-site.xml to override the properties defined in
-${HBASE_HOME}/conf/hbase-default.xml (hbase-default.xml itself
-should never be modified). At a minimum the hbase.rootdir property should be redefined
-in hbase-site.xml to point HBase at the Hadoop filesystem to use. For example, adding the property
-below to your hbase-site.xml says that HBase should use the /hbase directory in the
-HDFS whose namenode is at port 9000 on your local machine:
-
A pseudo-distributed operation is simply a distributed operation
+run on a single host. Once you have confirmed your DFS setup,
+configuring HBase for use on one host requires modification of ${HBASE_HOME}/conf/hbase-site.xml,
+which needs to be pointed at the running Hadoop DFS instance. Use hbase-site.xml
+to override the properties defined in ${HBASE_HOME}/conf/hbase-default.xml
+(hbase-default.xml itself should never be modified). At a
+minimum the hbase.rootdir property should be redefined in hbase-site.xml
+to point HBase at the Hadoop filesystem to use. For example, adding the
+property below to your hbase-site.xml says that HBase
+should use the /hbase directory in the HDFS whose namenode
+is at port 9000 on your local machine:
<configuration> ... @@ -148,19 +145,18 @@ ... </configuration>-
Note: Let hbase create the directory. If you don't, you'll get warning saying hbase -needs a migration run because the directory is missing files expected by hbase (it'll -create them if you let it). -
+Note: Let hbase create the directory. If you don't, you'll get +warning saying hbase needs a migration run because the directory is +missing files expected by hbase (it'll create them if you let it).
-For running a fully-distributed operation on more than one host, the following -configurations must be made in addition to those described in the -pseudo-distributed operation section above. -In this mode, a ZooKeeper cluster is required.
-In hbase-site.xml, set hbase.cluster.distributed to 'true'.
-
--+Fully-Distributed Operation
+For running a fully-distributed operation on more than one host, +the following configurations must be made in addition to those +described in the pseudo-distributed +operation section above. In this mode, a ZooKeeper cluster is required.
+In
hbase-site.xml, sethbase.cluster.distributed+to 'true'. +- -<configuration> ... <property> @@ -173,68 +169,56 @@ </property> ... </configuration> ---In fully-distributed operation, you probably want to change your
-hbase.rootdir-from localhost to the name of the node running the HDFS namenode. In addition -tohbase-site.xmlchanges, a fully-distributed operation requires that you -modify${HBASE_HOME}/conf/regionservers. -Theregionserverfile lists all hosts running HRegionServers, one host per line -(This file in HBase is like the hadoop slaves file at${HADOOP_HOME}/conf/slaves). --A distributed HBase depends on a running ZooKeeper cluster. -HBase can manage a ZooKeeper cluster for you, or you can manage it on your own -and point HBase to it. -To toggle this option, use the
HBASE_MANAGES_ZKvariable in-${HBASE_HOME}/conf/hbase-env.sh. -This variable, which defaults totrue, tells HBase whether to -start/stop the ZooKeeper quorum servers alongside the rest of the servers. +
-To point HBase at an existing ZooKeeper cluster, add your zoo.cfg
-to the CLASSPATH.
-HBase will see this file and use it to figure out where ZooKeeper is.
-Additionally set HBASE_MANAGES_ZK in ${HBASE_HOME}/conf/hbase-env.sh
- to false so that HBase doesn't mess with your ZooKeeper setup:
-
+For more information about setting up a ZooKeeper cluster on your own, +see the ZooKeeper Getting +Started Guide. HBase currently uses ZooKeeper version 3.2.0, so any +cluster setup with a 3.x.x version of ZooKeeper should work. +In fully-distributed operation, you probably want to change your +
+hbase.rootdirfrom localhost to the name of the node +running the HDFS namenode. In addition tohbase-site.xml+changes, a fully-distributed operation requires that you modify${HBASE_HOME}/conf/regionservers. +Theregionserverfile lists all hosts running +HRegionServers, one host per line (This file in HBase is like the hadoop +slaves file at${HADOOP_HOME}/conf/slaves).A distributed HBase depends on a running ZooKeeper cluster. HBase +can manage a ZooKeeper cluster for you, or you can manage it on your own +and point HBase to it. To toggle this option, use the
+HBASE_MANAGES_ZK+variable in${HBASE_HOME}/conf/hbase-env.sh. This +variable, which defaults totrue, tells HBase whether to +start/stop the ZooKeeper quorum servers alongside the rest of the +servers.To point HBase at an existing ZooKeeper cluster, add your
zoo.cfg+to theCLASSPATH. HBase will see this file and use it to +figure out where ZooKeeper is. Additionally setHBASE_MANAGES_ZK+in${HBASE_HOME}/conf/hbase-env.shtofalse+so that HBase doesn't mess with your ZooKeeper setup:... # Tell HBase whether it should manage it's own instance of Zookeeper or not. export HBASE_MANAGES_ZK=false --For more information about setting up a ZooKeeper cluster on your own, see -the ZooKeeper Getting Started Guide. -HBase currently uses ZooKeeper version 3.2.0, so any cluster setup with a 3.x.x -version of ZooKeeper should work. - --To have HBase manage the ZooKeeper cluster, you can use a
zoo.cfg- file as above, or edit the options directly in the${HBASE_HOME}/conf/hbase-site.xml. -Every option from thezoo.cfghas a corresponding property in the -XML configuration file namedhbase.zookeeper.property.OPTION. -For example, theclientPortsetting in ZooKeeper can be changed by -setting thehbase.zookeeper.property.clientPortproperty. -For the full list of available properties, see ZooKeeper'szoo.cfg. +
To have HBase manage the ZooKeeper cluster, you can use a zoo.cfg
+file as above, or edit the options directly in the ${HBASE_HOME}/conf/hbase-site.xml.
+Every option from the zoo.cfg has a corresponding property
+in the XML configuration file named hbase.zookeeper.property.OPTION.
+For example, the clientPort setting in ZooKeeper can be
+changed by setting the hbase.zookeeper.property.clientPort
+property. For the full list of available properties, see ZooKeeper's zoo.cfg.
For the default values used by HBase, see ${HBASE_HOME}/conf/hbase-default.xml.
-At minimum, you should set the list of servers that you want ZooKeeper to run
-on using the hbase.zookeeper.quorum property.
-This property defaults to localhost which is not suitable for a
-fully distributed HBase.
-It is recommended to run a ZooKeeper quorum of 5 or 7 machines, and give each
-server around 1GB to ensure that they don't swap.
-It is also recommended to run the ZooKeeper servers on separate machines from
-the Region Servers with their own disks.
-If this is not easily doable for you, choose 5 of your region servers to run the
-ZooKeeper servers on.
-
-As an example, to have HBase manage a ZooKeeper quorum on nodes -rs{1,2,3,4,5}.example.com, bound to port 2222 (the default is 2181), use: -
+Note that you can use HBase in this manner to spin up a ZooKeeper +cluster, unrelated to HBase. Just make sure to setAt minimum, you should set the list of servers that you want +ZooKeeper to run on using the
+hbase.zookeeper.quorum+property. This property defaults tolocalhostwhich is not +suitable for a fully distributed HBase. It is recommended to run a +ZooKeeper quorum of 5 or 7 machines, and give each server around 1GB to +ensure that they don't swap. It is also recommended to run the ZooKeeper +servers on separate machines from the Region Servers with their own +disks. If this is not easily doable for you, choose 5 of your region +servers to run the ZooKeeper servers on.As an example, to have HBase manage a ZooKeeper quorum on nodes +rs{1,2,3,4,5}.example.com, bound to port 2222 (the default is 2181), +use:
${HBASE_HOME}/conf/hbase-env.sh: ... @@ -266,94 +250,105 @@ </property> ... </configuration> -- --When HBase manages ZooKeeper, it will start/stop the ZooKeeper servers as a part -of the regular start/stop scripts. If you would like to run it yourself, you can -do: -
++When HBase manages ZooKeeper, it will start/stop the ZooKeeper +servers as a part of the regular start/stop scripts. If you would like +to run it yourself, you can do:
${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper --Note that you can use HBase in this manner to spin up a ZooKeeper cluster, -unrelated to HBase. Just make sure to setHBASE_MANAGES_ZKto -falseif you want it to stay up so that when HBase shuts down it -doesn't take ZooKeeper with it. - +
HBASE_MANAGES_ZK
+to false if you want it to stay up so that when HBase shuts
+down it doesn't take ZooKeeper with it.
-Of note, if you have made HDFS client configuration on your hadoop cluster, HBase will not -see this configuration unless you do one of the following: +
Of note, if you have made HDFS client configuration on +your hadoop cluster, HBase will not see this configuration unless you do +one of the following:
HADOOP_CONF_DIR to CLASSPATH in hbase-env.shhdfs-site.xml (or hadoop-site.xml) to ${HBASE_HOME}/conf, orhbase-site.xmlHADOOP_CONF_DIR to CLASSPATH
+ in hbase-env.shhdfs-site.xml (or hadoop-site.xml)
+ to ${HBASE_HOME}/conf, orhbase-site.xmldfs.replication. If for example,
-you want to run with a replication factor of 5, hbase will create files with the default of 3 unless
-you do the above to make the configuration available to HBase.
-
+An example of such an HDFS client configuration is dfs.replication.
+If for example, you want to run with a replication factor of 5, hbase
+will create files with the default of 3 unless you do the above to make
+the configuration available to HBase.
-If you are running in standalone, non-distributed mode, HBase by default uses -the local filesystem.
- -If you are running a distributed cluster you will need to start the Hadoop DFS daemons and -ZooKeeper Quorum -before starting HBase and stop the daemons after HBase has shut down.
-Start and
-stop the Hadoop DFS daemons by running ${HADOOP_HOME}/bin/start-dfs.sh.
-You can ensure it started properly by testing the put and get of files into the Hadoop filesystem.
-HBase does not normally use the mapreduce daemons. These do not need to be started.
Start up your ZooKeeper cluster.
- -Start HBase with the following command: +
If you are running in standalone, non-distributed mode,
+HBase by default uses the local filesystem. No special action
+needs to be taken as the local filesystem is always on.
+However, if you are running a distributed cluster you will need
+to start the Hadoop DFS daemons and ZooKeeper Quorum before starting
+HBase and stop the daemons after HBase has shut down:
+
${HADOOP_HOME}/bin/start-dfs.sh.
+ You can ensure it started properly by testing the put and get of files
+ into the Hadoop filesystem.Start HBase with the following command:
${HBASE_HOME}/bin/start-hbase.sh
-
-Once HBase has started, enter ${HBASE_HOME}/bin/hbase shell to obtain a
-shell against HBase from which you can execute commands.
-Test your installation by creating, viewing, and dropping
-To stop HBase, exit the HBase shell and enter:
-
Once HBase has started, enter ${HBASE_HOME}/bin/hbase
+shell to obtain a shell against HBase from which you can execute
+commands. Test your installation by creating, viewing, and dropping a
+table. Use exit to leave the shell.
To stop HBase, exit the HBase shell and enter:
${HBASE_HOME}/bin/stop-hbase.sh
--If you are running a distributed operation, be sure to wait until HBase has shut down completely -before stopping the Hadoop daemons. -
-
-The default location for logs is ${HBASE_HOME}/logs.
-
HBase also puts up a UI listing vital attributes. By default its deployed on the master host -at port 60010 (HBase regionservers listen on port 60020 by default and put up an informational -http server at 60030).
+If you are running a distributed operation, be sure to wait until +HBase has shut down completely before stopping the Hadoop daemons.
+The default location for logs is ${HBASE_HOME}/logs.
HBase also puts up a UI listing vital attributes. By default its +deployed on the master host at port 60010 (HBase regionservers listen on +port 60020 by default and put up an informational http server at 60030).
-After installing a new HBase on top of data written by a previous HBase version, before
-starting your cluster, run the ${HBASE_DIR}/bin/hbase migrate migration script.
-It will make any adjustments to the filesystem data under hbase.rootdir necessary to run
-the HBase version. It does not change your install unless you explicitly ask it to.
-
After installing a new HBase on top of data written by a previous
+HBase version, before starting your cluster, run the ${HBASE_DIR}/bin/hbase
+migrate migration script. It will make any adjustments to the filesystem
+data under hbase.rootdir necessary to run the HBase
+version. It does not change your install unless you explicitly ask it
+to.
If your client is NOT Java, consider the Thrift or REST libraries.
+If your client is NOT Java, consider the Thrift or REST +libraries.
-HBase is a distributed, column-oriented store, modeled after Google's BigTable. HBase is built on top of Hadoop for its MapReduce and distributed file system implementation. All these projects are open-source and part of the Apache Software Foundation.
+As being distributed, large scale platforms, the Hadoop and HBase projects mainly focus on *nix environments for production installations. However, being developed in Java, both projects are fully portable across platforms and, hence, also to the Windows operating system. For ease of development the projects rely on Cygwin to have a *nix-like environment on Windows to run the shell scripts.
+ +This document explains the intricacies of running HBase on Windows using Cygwin as an all-in-one single-node installation for testing and development. The HBase Overview and QuickStart guides on the other hand go a long way in explaning how to setup HBase in more complex deployment scenario's.
+ +For running HBase on Windows, 3 technologies are required: Java, Cygwin and SSH. The following paragraphs detail the installation of each of the aforementioned technologies.
+ +HBase depends on the Java Platform, Standard Edition, 6 Release. So the target system has to be provided with at least the Java Runtime Environment (JRE); however if the system will also be used for development, the Jave Development Kit (JDK) is preferred. You can download the latest versions for both from Sun's download page. Installation is a simple GUI wizard that guides you through the process.
+ +Cygwin is probably the oddest technology in this solution stack. It provides a dynamic link library that emulates most of a *nix environment on Windows. On top of that a whole bunch of the most common *nix tools are supplied. Combined, the DLL with the tools form a very *nix-alike environment on Windows.
+For installation, Cygwin provides the setup.exe utility that tracks the versions of all installed components on the target system and provides the mechanism for installing or updating everything from the mirror sites of Cygwin.
To support installation, the setup.exe utility uses 2 directories on the target system. The Root directory for Cygwin (defaults to C:\cygwin) which will become / within the eventual Cygwin installation; and the Local Package directory (e.g. C:\cygsetup that is the cache where setup.exe stores the packages before they are installed. The cache must not be the same folder as the Cygwin root.
Perform following steps to install Cygwin, which are elaboratly detailed in the 2nd chapter of the Cygwin User's Guide:
+ +Administrator privileges on the target system.C:\cygwin\root and C:\cygwin\setup folders.setup.exe utility and save it to the Local Package directory.setup.exe utility,
+Install from Internet option,setup.exe utility in the Local Package folder.CYGWIN_HOME system-wide environment variable that points to your Root directory.%CYGWIN_HOME%\bin to the end of your PATH environment variable.Cygwin.bat command in the Root folder. You should end up in a terminal window that is running a Bash shell. Test the shell by issuing following commands:
+cd / should take you to thr Root directory in Cygwin;LS commands that should list all files and folders in the current directory.exit command to end the terminal.HBase (and Hadoop) rely on SSH for interprocess/-node communication and launching remote commands. SSH will be provisioned on the target system via Cygwin, which supports running Cygwin programs as Windows services!
+ +setup.exe utility.Next button until the Select Packages panel is shown.View button to toggle to the list view, which is ordered alfabetically on Package, making it easier to find the packages we'll need.Skip) so it's marked for installation. Use the Next button to download and install the packages.
+Download the latest release of HBase from the website. As the HBase distributable is just a zipped archive, installation is as simple as unpacking the archive so it ends up in its final installation directory. Notice that HBase has to be installed in Cygwin and a good directory suggestion is to use /usr/local/ (or [Root directory]\usr\local in Windows slang). You should end up with a /usr/local/hbase-<version> installation in Cygwin.
There are 3 parts left to configure: Java, SSH and HBase itself. Following paragraphs explain eacht topic in detail.
+ +One important thing to remember in shell scripting in general (i.e. *nix and Windows) is that managing, manipulating and assembling path names that contains spaces can be very hard, due to the need to escape and quote those characters and strings. So we try to stay away from spaces in path names. *nix environments can help us out here very easily by using symbolic links.
+ +/usr/local to the Java home directory by using the following command and substituting the name of your chosen Java environment:
+LN -s /cygdrive/c/Program\ Files/Java/<jre name> /usr/local/<jre name>+
CD /usr/local/<jre name> and issueing the command ./bin/java -version. This should output your version of the chosen JRE.Configuring SSH is quite elaborate, but primarily a question of launching it by default as a Windows service.
+ +Run as Administrator.LS -L command on the different files. Also, notice the auto-completion feature in the shell using <TAB> is extremely handy in these situations.
+chmod +r /etc/passwd to make the passwords file readable for allchmod u+w /etc/passwd to make the passwords file writable for the ownerchmod +r /etc/group to make the groups file readable for allchmod u+w /etc/group to make the groups file writable for the ownerchmod 755 /var to make the var folder writable to owner and readable and executable to allPARANOID line:
+ALL : localhost 127.0.0.1/32 : allowALL : [::1]/128 : allowssh-host-config
+/etc/ssh_config, answer yes./etc/sshd_config, answer yes.yes.sshd as a service, answer yes. Make sure you started your shell as Adminstrator!<enter> as the default is ntsec.sshd account, answer yes.no as the default will suffice.cyg_server account, answer yes. Enter a password for the account.net start sshd or cygrunsrv --start sshd. Notice that cygrunsrv is the utility that make the process run as a Windows service. Confirm that you see a message stating that the CYGWIN sshd service was started succesfully.mkpasswd -cl > /etc/passwdmkgroup --local > /etc/groupwhoami to verify your userIDssh localhost to connect to the system itself
+yes when presented with the server's fingerprintexit command should take you back to your first shell in CygwinExit should terminate the Cygwin shell.[installation directory] as working directory.
+./conf/hbase-env.sh to configure its dependencies on the runtime environment. Copy and uncomment following lines just underneath their original, change them to fit your environemnt. They should read something like:
+export JAVA_HOME=/usr/local/<jre name>export HBASE_IDENT_STRING=$HOSTNAME as this most likely does not inlcude spaces.hbase-default.xml file for configuration. Some properties do not resolve to existing directories because the JVM runs on Windows. This is the major issue to keep in mind when working with Cygwin: within the shell all paths are *nix-alike, hence relative to the root /. However, every parameter that is to be consumed within the windows processes themself, need to be Windows settings, hence C:\-alike. Change following propeties in the configuration file, adjusting paths where necessary to conform with your own installation:
+hbase.rootdir must read e.g. file:///C:/cygwin/root/tmp/hbase/datahbase.tmp.dir must read C:/cygwin/root/tmp/hbase/tmphbase.zookeeper.quorum must read 127.0.0.1 because for some reason localhost doesn't seem to resolve properly on Cygwin.hbase.rootdir and hbase.tmp.dir directories exist and have the proper rights set up e.g. by issuing a chmod 777 on them.CD /usr/local/hbase-<version>, preferably using auto-completion../bin/start-hbase.sh
+yes../logs directory for any exceptions../bin/hbase shellcreate 'test', 'data'listput 'test', 'row1', 'data:1', 'value1' +put 'test', 'row2', 'data:2', 'value2' +put 'test', 'row3', 'data:3', 'value3'+
scan 'test' that should list all the rows previously inserted. Notice how 3 new columns where added without changing the schema!disable 'test' followed by drop 'test' and verified by list which should give an empty listing.exit./bin/stop-hbase.sh command. And wait for it to complete!!! Killing the process might corrupt your data on disk../logs directory.#hbase@freenode.net). People are very active and keen to help out!