Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-8751

Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.98.1, 0.99.0
    • Replication
    • None
    • Reviewed
    • Hide
      From the shell's doc:
      # set table / table-cf to be replicable for a peer, for a table without
      # an explicit column-family list, all replicable column-families (with
      # replication_scope == 1) will be replicated
       hbase> set_peer_tableCFs '2', "table1; table2:cf1,cf2; table3:cfA,cfB"
      Show
      From the shell's doc: # set table / table-cf to be replicable for a peer, for a table without # an explicit column-family list, all replicable column-families (with # replication_scope == 1) will be replicated  hbase> set_peer_tableCFs '2', "table1; table2:cf1,cf2; table3:cfA,cfB"

    Description

      Consider scenarios (all cf are with replication-scope=1):

      1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C has cf1,cf2.

      2) cluster X wants to replicate table A : cfA, table B : cfX and table C from cluster S.

      3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S.

      Current replication implementation can't achieve this since it'll push the data of all the replicatable column-families from cluster S to all its peers, X/Y in this scenario.

      This improvement provides a fine-grained replication theme which enable peer cluster to choose the column-families/tables they really want from the source cluster:

      A). Set the table:cf-list for a peer when addPeer:
      hbase-shell> add_peer '3', "zk:1100:/hbase", "table1; table2:cf1,cf2; table3:cf2"

      B). View the table:cf-list config for a peer using show_peer_tableCFs:
      hbase-shell> show_peer_tableCFs "1"

      C). Change/set the table:cf-list for a peer using set_peer_tableCFs:
      hbase-shell> set_peer_tableCFs '2', "table1:cfX; table2:cf1; table3:cf1,cf2"

      In this theme, replication-scope=1 only means a column-family CAN be replicated to other clusters, but only the 'table:cf-list list' determines WHICH cf/table will actually be replicated to a specific peer.

      To provide back-compatibility, empty 'table:cf-list list' will replicate all replicatable cf/table. (this means we don't allow a peer which replicates nothing from a source cluster, we think it's reasonable: if replicating nothing why bother adding a peer?)

      This improvement addresses the exact problem raised by the first FAQ in "http://hbase.apache.org/replication.html":
      "GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster Y? or is that for later?
      Yes, this is for much later."

      I also noticed somebody mentioned "replication-scope" as integer rather than a boolean is for such fine-grained replication purpose, but I think extending "replication-scope" can't achieve the same replication granularity flexibility as providing above per-peer replication configurations.

      This improvement has been running smoothly in our production clusters (Xiaomi) for several months.

      Attachments

        1. HBASE-8751-0.94-V0.patch
          39 kB
          Honghua Feng
        2. HBASE-8751-0.94-v1.patch
          39 kB
          Honghua Feng
        3. HBASE-8751-trunk_v0.patch
          37 kB
          Honghua Feng
        4. HBASE-8751-trunk_v1.patch
          43 kB
          Honghua Feng
        5. HBASE-8751-trunk_v2.patch
          43 kB
          Honghua Feng
        6. HBASE-8751-trunk_v3.patch
          44 kB
          Honghua Feng

        Activity

          People

            fenghh Honghua Feng
            fenghh Honghua Feng
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: