In some use cases, hbase clients writes to separate clusters(probably different datacenters) tables for redundancy. As an administrator/application architect, I would like to find out if both cluster tables are in the same state (cell by cell). One of the tools that is readily available to use is VerifyRep which is part of replication.
However, it requires peerId to be setup on atleast of the involved cluster. PeerId is unnecessary in this use-case scenario and possibly cause unintended consequences as the clusters aren't really replication peers neither do We prefer them to be.
Looking at the code:
Tool attempts to get only the clusterKey which is essentially ZooKeeper quorum url
So I would like to propose to update the tool to pass the remote cluster ZkQuorum as an argument (ex. --peerQuorumAddress clusterBzk1,clusterBzk2,clusterBzk3:2181/hbase-secure ) and use it effectively without dependence on replication peerId, similar to peerFSAddress. The are certain advantages in doing so as follows:
- Reduce the development/maintenance of separate tool for above scenario
- Allow the tool to be more useful for other scenarios as well such as
- validating backups in remote cluster HBASE-19106
- compare cloned tableA and original tableA in same/remote cluster incase of user error before restoring snapshot to original table to find the records that need to be added/invalid/missing etc
- Allow backup operators who are non-Hbase admins(who shouldn't be adding the peerId) to run the tool, since currently only Hbase superuser can add a peerId for reasons discussed in HBASE-21163.
Please post your comments