HBase
  1. HBase
  2. HBASE-6453

Hbase Replication point in time feature

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.94.0
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None

      Description

      Now we can not control when hbase replication start to work. this patch support we set a time stamp filter . All the row which is below this time stamp will not be replicated. We also can delete and show this time stamp in hbase shell if we want to change it.

        Activity

        Hide
        terry zhang added a comment -
        hbase(main):001:0> set_timefilter
        
        ERROR: wrong number of arguments (0 for 2)
        
        Here is some help for this command:
        Set a peer cluster time filter to replicate to, the row which time stamp is before
        the timestamp will be filtered.
        
        Examples:
        
          hbase> set_timefilter '1', "1329896850047"
          hbase> set_timefilter '2', "1329896850047"
        
        
        hbase(main):002:0> set_timefilter '1',"1329896850047"
        0 row(s) in 0.3000 seconds
        

        set time stamp to 1329896850047. Them all the kvs which is early than 1329896850047 will be filterd

        Show
        terry zhang added a comment - hbase(main):001:0> set_timefilter ERROR: wrong number of arguments (0 for 2) Here is some help for this command: Set a peer cluster time filter to replicate to, the row which time stamp is before the timestamp will be filtered. Examples: hbase> set_timefilter '1', "1329896850047" hbase> set_timefilter '2', "1329896850047" hbase(main):002:0> set_timefilter '1', "1329896850047" 0 row(s) in 0.3000 seconds set time stamp to 1329896850047. Them all the kvs which is early than 1329896850047 will be filterd
        Hide
        terry zhang added a comment -
        hbase(main):003:0> get_timefilter '1'
        PEER_ID                            TIME_FILTER                                                                                      
         1                                 1329896850047     
        

        we can show the time stamp by clusterId and check if it is set correctly

        Show
        terry zhang added a comment - hbase(main):003:0> get_timefilter '1' PEER_ID TIME_FILTER 1 1329896850047 we can show the time stamp by clusterId and check if it is set correctly
        Hide
        terry zhang added a comment -

        we also can drop the time filter. after we drop it , time stamp change to zero . and all the kvs will be replicated.

        hbase(main):004:0> drop_timefilter '1'
        0 row(s) in 0.0030 seconds
        
        hbase(main):005:0> get_timefilter '1'
        PEER_ID                            TIME_FILTER                                                                                      
         1                                 0     
        
        Show
        terry zhang added a comment - we also can drop the time filter. after we drop it , time stamp change to zero . and all the kvs will be replicated. hbase(main):004:0> drop_timefilter '1' 0 row(s) in 0.0030 seconds hbase(main):005:0> get_timefilter '1' PEER_ID TIME_FILTER 1 0
        Hide
        stack added a comment -

        So idea is you'd set this timestamp and we'd replicate from that time on?

        What is the use case? Why not have replication happen when enabled?

        You write the timestamp to zk under clusterid znode? Into a znode named timefilter? The name should be timestamp? Or start_timestamp?

        Thats great you added shell commands for setting/getting. Should they have replication mentioned in command name so they are known to be replication facility? Maybe its not necessary as long as these new commands are grouped w/ other replication commands (that is so?)

        Thanks for the new feature Terry.

        Show
        stack added a comment - So idea is you'd set this timestamp and we'd replicate from that time on? What is the use case? Why not have replication happen when enabled? You write the timestamp to zk under clusterid znode? Into a znode named timefilter? The name should be timestamp? Or start_timestamp? Thats great you added shell commands for setting/getting. Should they have replication mentioned in command name so they are known to be replication facility? Maybe its not necessary as long as these new commands are grouped w/ other replication commands (that is so?) Thanks for the new feature Terry.
        Hide
        terry zhang added a comment -

        Hi,Stack. I think we can use point in time feature with Snapshots feature (HBASE-6055) in below case.
        1. Master cluster in data center China do a Snapshot at Time A
        2. copy the Snapshot to Slave cluster in data center US and Set Replication Time stamp to A
        3. Restore snapshots (HBASE-6230 ) for Slave cluster and start replication.

        Them Slave cluster data will be as same as Master Cluster . And data is more safe and US user can visit Slave cluster for getting or scanning data to decrease the stress for China data center. Enable replication can not control the accurate time so may be it will lose some data or replicate some useless data. Mysql also has point in time/position feature in it replication Framework. It is very convenience for data center administrate to use.

        We can give some better name for this operation cause I am not good at naming ...

        Show
        terry zhang added a comment - Hi,Stack. I think we can use point in time feature with Snapshots feature ( HBASE-6055 ) in below case. 1. Master cluster in data center China do a Snapshot at Time A 2. copy the Snapshot to Slave cluster in data center US and Set Replication Time stamp to A 3. Restore snapshots ( HBASE-6230 ) for Slave cluster and start replication. Them Slave cluster data will be as same as Master Cluster . And data is more safe and US user can visit Slave cluster for getting or scanning data to decrease the stress for China data center. Enable replication can not control the accurate time so may be it will lose some data or replicate some useless data. Mysql also has point in time/position feature in it replication Framework. It is very convenience for data center administrate to use. We can give some better name for this operation cause I am not good at naming ...

          People

          • Assignee:
            terry zhang
            Reporter:
            terry zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development