Cluster ReplicationSee Cluster Replication.
+
+ Status
+
+ This package is experimental quality software and is only meant to be a base
+ for future developments. The current implementation offers the following
+ features:
+
+
+ Master/Slave replication.
+ Master/Master replication.
+ Cyclic replication.
+ Replication of scoped families in user tables.
+ Start/stop replication stream.
+ Supports clusters of different sizes.
+ Handling of partitions longer than 10 minutes.
+ Ability to add/remove slave clusters at runtime.
+ MapReduce job to compare tables on two clusters
+
+
+ Please report bugs on the project's Jira when found.
+
+
+
+
+ Requirements
+
+ Before trying out replication, make sure to review the following requirements:
+
+ Zookeeper should be handled by yourself, not by HBase, and should
+ always be available during the deployment.
+ All machines from both clusters should be able to reach every
+ other machine since replication goes from any region server to any
+ other one on the slave cluster. That also includes the
+ Zookeeper clusters.
+ Both clusters should have the same HBase and Hadoop major revision.
+ For example, having 0.90.1 on the master and 0.90.0 on the slave is
+ correct but not 0.90.1 and 0.89.20100725.
+ Every table that contains families that are scoped for replication
+ should exist on every cluster with the exact same name, same for those
+ replicated families.
+ For multiple slaves, Master/Master, or cyclic replication version
+ 0.92 or greater is needed.
+
+
+
+
+ Deployment
+
+ The following steps describe how to enable replication from a cluster
+ to another.
+
+
+ Edit ${HBASE_HOME}/conf/hbase-site.xml
+ on both cluster to add the following configurations:
+
+ hbase.replication
+ true
+
+ ]]>
+
+ Deploy the files, and then restart HBase if it was running.
+
+
+ Run the following command in the master's shell while it's running
+ add_peer
+ This will show you the help to setup the replication stream between
+ both clusters. If both clusters use the same Zookeeper cluster, you have
+ to use a different
+ zookeeper.znode.parent since they can't
+ write in the same folder.
+
+
+
+ Once you have a peer, you need to enable replication on your column families.
+ One way to do it is to alter the table and to set the scope like this:
+
+
+ disable 'your_table'
+ alter 'your_table', {NAME => 'family_name', REPLICATION_SCOPE => '1'}
+ enable 'your_table'
+
+
+ Currently, a scope of 0 (default) means that it won't be replicated and a
+ scope of 1 means it's going to be. In the future, different scope can be
+ used for routing policies.
+
+
+
+ To list all configured peers run the following command in the master's shell:
+
+ list_peers (as of version 0.92)
+
+
+
+
+ You can confirm that your setup works by looking at any region server's log
+ on the master cluster and look for the following lines;
+
+
+ Considering 1 rs, with ratio 0.1
+ Getting 1 rs from peer cluster # 0
+ Choosing peer 10.10.1.49:62020
+
+
+ In this case it indicates that 1 region server from the slave cluster
+ was chosen for replication.
+
+
+
+ Verifying Replicated Data
+
+ Verifying the replicated data on two clusters is easy to do in the shell when
+ looking only at a few rows, but doing a systematic comparison requires more
+ computing power. This is why the VerifyReplication MR job was created, it has
+ to be run on the master cluster and needs to be provided with a peer id (the
+ one provided when establishing a replication stream) and a table name. Other
+ options let you specify a time range and specific families. This job's short
+ name is "verifyrep" and needs to be provided when pointing "hadoop jar" to the
+ hbase jar.
+
+
+ Another alternative to execute it is call through $HBASE_HOME/bin/hbase:
+
+
+ $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication [options] [peerid] [tablename]
+
+ To get more informations about it, please try:
+
+ $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication --help
+
+ HBase Backup