Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21201

Support to run VerifyReplication MR tool without peerid

    XMLWordPrintableJSON

Details

    • Hide
      We can specify peerQuorumAddress instead of peerId in VerifyReplication tool. So it no longer requires peerId to be setup when using this tool.

      For example:
      hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication zk1,zk2,zk3:2181/hbase testTable
      Show
      We can specify peerQuorumAddress instead of peerId in VerifyReplication tool. So it no longer requires peerId to be setup when using this tool. For example: hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication zk1,zk2,zk3:2181/hbase testTable

    Description

      In some use cases, hbase clients writes to separate clusters(probably different datacenters) tables for redundancy. As an administrator/application architect, I would like to find out if both cluster tables are in the same state (cell by cell). One of the tools that is readily available to use is VerifyRep which is part of replication.

      However, it requires peerId to be setup on atleast of the involved cluster. PeerId is unnecessary in this use-case scenario and possibly cause unintended consequences as the clusters aren't really replication peers neither do We prefer them to be.

      Looking at the code:

      Tool attempts to get only the clusterKey which is essentially ZooKeeper quorum url

       

      //VerifyReplication.java
      
      private static Pair<ReplicationPeerConfig, Configuration> getPeerQuorumConfig(final Configuration conf, String peerId)
      .
      .
      return Pair.newPair(peerConfig,
              ReplicationUtils.getPeerClusterConfiguration(peerConfig, conf));
      
      
      //ReplicationUtils.java
      public static Configuration getPeerClusterConfiguration(ReplicationPeerConfig peerConfig, Configuration baseConf) throws ReplicationException {
      Configuration otherConf;
      try {
      otherConf = HBaseConfiguration.createClusterConf(baseConf, peerConfig.getClusterKey());

       

       

      So I would like to propose to update the tool to pass the remote cluster ZkQuorum as an argument (ex. --peerQuorumAddress clusterBzk1,clusterBzk2,clusterBzk3:2181/hbase-secure ) and use it effectively without dependence on replication peerId, similar to peerFSAddress. The are certain advantages in doing so as follows:

      • Reduce the development/maintenance of separate tool for above scenario
      • Allow the tool to be more useful for other scenarios as well such as 
        • validating backups in remote cluster HBASE-19106
        • compare cloned tableA and original tableA in same/remote cluster incase of user error before restoring snapshot to original table to find the records that need to be added/invalid/missing etc
        • Allow backup operators who are non-Hbase admins(who shouldn't be adding the peerId) to run the tool, since currently only Hbase superuser can add a peerId for reasons discussed in HBASE-21163.

      Please post your comments

      Thanks

      cc: clayb, brfrn169 , vrodionov , rashidaligee

      Attachments

        1. HBASE-21201.master.001.patch
          11 kB
          Toshihiro Suzuki
        2. HBASE-21201.master.002.patch
          8 kB
          Toshihiro Suzuki
        3. HBASE-21201.master.003.patch
          13 kB
          Toshihiro Suzuki
        4. HBASE-21201.master.003.patch
          13 kB
          Toshihiro Suzuki
        5. HBASE-21201.master.004.patch
          14 kB
          Toshihiro Suzuki

        Issue Links

          Activity

            People

              brfrn169 Toshihiro Suzuki
              sujit Sujit P
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: