diff --git a/hbase-server/src/main/javadoc/org/apache/hadoop/hbase/replication/package.html b/hbase-server/src/main/javadoc/org/apache/hadoop/hbase/replication/package.html index 2f2e24a..8a42139 100644 --- a/hbase-server/src/main/javadoc/org/apache/hadoop/hbase/replication/package.html +++ b/hbase-server/src/main/javadoc/org/apache/hadoop/hbase/replication/package.html @@ -22,144 +22,6 @@

Multi Cluster Replication

-This package provides replication between HBase clusters. -

- -

Table Of Contents

-
    -
  1. Status
  2. -
  3. Requirements
  4. -
  5. Deployment
  6. -
  7. Verifying Replicated Data
  8. -
- -

- -

Status

- -

-This package is experimental quality software and is only meant to be a base -for future developments. The current implementation offers the following -features: - -

    -
  1. Master/Slave replication.
  2. -
  3. Master/Master replication.
  4. -
  5. Cyclic replication.
  6. -
  7. Replication of scoped families in user tables.
  8. -
  9. Start/stop replication stream.
  10. -
  11. Supports clusters of different sizes.
  12. -
  13. Handling of partitions longer than 10 minutes.
  14. -
  15. Ability to add/remove slave clusters at runtime.
  16. -
  17. MapReduce job to compare tables on two clusters
  18. -
-Please report bugs on the project's Jira when found. -

- -

Requirements

- -

- -Before trying out replication, make sure to review the following requirements: - -

    -
  1. Zookeeper should be handled by yourself, not by HBase, and should - always be available during the deployment.
  2. -
  3. All machines from both clusters should be able to reach every - other machine since replication goes from any region server to any - other one on the slave cluster. That also includes the - Zookeeper clusters.
  4. -
  5. Both clusters should have the same HBase and Hadoop major revision. - For example, having 0.90.1 on the master and 0.90.0 on the slave is - correct but not 0.90.1 and 0.89.20100725.
  6. -
  7. Every table that contains families that are scoped for replication - should exist on every cluster with the exact same name, same for those - replicated families.
  8. -
  9. For multiple slaves, Master/Master, or cyclic replication version - 0.92 or greater is needed.
  10. -
- -

- -

Deployment

- -

- -The following steps describe how to enable replication from a cluster -to another. -

    -
  1. Edit ${HBASE_HOME}/conf/hbase-site.xml on both cluster to add - the following configurations: -
    -<property>
    -  <name>hbase.replication</name>
    -  <value>true</value>
    -</property>
    - deploy the files, and then restart HBase if it was running. -
  2. -
  3. Run the following command in the master's shell while it's running -
    add_peer 'ID' 'CLUSTER_KEY'
    - The ID is a string, which must not contain a hyphen. To compose the CLUSTER_KEY, use the following template: -
    hbase.zookeeper.quorum:hbase.zookeeper.property.clientPort:zookeeper.znode.parent
    - This will show you the help to setup the replication stream between - both clusters. If both clusters use the same Zookeeper cluster, you have - to use a different zookeeper.znode.parent since they can't - write in the same folder. -
  4. -
  5. - Once you have a peer, you need to enable replication on your column families. - One way to do it is to alter the table and to set the scope like this: -
    -      disable 'your_table'
    -      alter 'your_table', {NAME => 'family_name', REPLICATION_SCOPE => '1'}
    -      enable 'your_table'
    -    
    - Currently, a scope of 0 (default) means that it won't be replicated and a - scope of 1 means it's going to be. In the future, different scope can be - used for routing policies. -
  6. -
  7. To list all configured peers run the following command in the master's - shell -
    list_peers
    (as of version 0.92) -
  8. -
  9. To enable a peer that was previousy disabled, run the following command in the master's shell. -
    enable_peer 'ID'
    -
  10. -
  11. To disable a peer, run the following command in the master's shell. This setting causes - HBase to stop sending the edits to that peer cluster, but it still keeps track of all the - new WALs that it will need to replicate if and when it is re-enabled. -
    disable_peer 'ID'
    -
  12. -
  13. To remove a peer, use the following command in the master's shell. -
    remove_peer 'ID'
    -
  14. -
- -You can confirm that your setup works by looking at any region server's log -on the master cluster and look for the following lines; - -
-Considering 1 rs, with ratio 0.1
-Getting 1 rs from peer cluster # 0
-Choosing peer 10.10.1.49:62020
- -In this case it indicates that 1 region server from the slave cluster -was chosen for replication.

-

- - -

Verifying Replicated Data

- -

-Verifying the replicated data on two clusters is easy to do in the shell when -looking only at a few rows, but doing a systematic comparison requires more -computing power. This is why the VerifyReplication MR job was created, it has -to be run on the master cluster and needs to be provided with a peer id (the -one provided when establishing a replication stream) and a table name. Other -options let you specify a time range and specific families. This job's short -name is "verifyrep" and needs to be provided when pointing "hadoop jar" to the -hbase jar. -

- +

Cluster replication documentation has been moved to the link:http://hbase.apache.org/book.html#_cluster_replication[Cluster Replication] section of the link:http://hbase.apache.org/book.html[Apache HBase Reference Guide].

diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc index 1402f52..e1731f2 100644 --- a/src/main/asciidoc/_chapters/ops_mgt.adoc +++ b/src/main/asciidoc/_chapters/ops_mgt.adoc @@ -1333,6 +1333,33 @@ hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication [--startti + The `VerifyReplication` command prints out `GOODROWS` and `BADROWS` counters to indicate rows that did and did not replicate correctly. +=== Managing Cluster Replication + +NOTE: This material was previously available in the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements[Replication API documentation]. + +==== Managing Replication Relationships +Several HBase Shell commands are available to assist you in managing replication relationships between clusters (also called peers). + +add_peer :: + Adds a replication relationship between two clusters. + + * ID -- a unique string, which must not contain a hyphen. + * CLUSTER_KEY: composed using the following template, with appropriate place-holders: `hbase.zookeeper.quorum:hbase.zookeeper.property.clientPort:zookeeper.znode.parent` +list_peers:: list all replication relationships known by this cluster +enable_peer :: + Enable a previously-disabled replication relationship +disable_peer :: + Disable a replication relationship. HBase will no longer send edits to that peer cluster, but it still keeps track of all the new WALs that it will need to replicate if and when it is re-enabled. +remove_peer :: + Disable and remove a replication relationship. HBase will no longer send edits to that peer cluster or keep track of WALs. + +==== Verifying Replicated Data + +The `VerifyReplication` MapReduce job, which is included in HBase, performs a systematic comparison of replicated data between two different clusters. Run the VerifyReplication job on the master cluster, supplying it with the peer ID and table name to use for validation. You can limit the verification further by specifying a time range or specific families. The job's short name is `verifyrep`. To run the job, use a command like the following: + +---- +$ hadoop jar /usr/lib/hbase/hbase.jar verifyrep --starttime= +--stoptime= --families= +---- === Detailed Information About Cluster Replication