HBase
  1. HBase
  2. HBASE-9360

Enable 0.94 -> 0.96 replication to minimize upgrade down time

    Details

    • Type: Brainstorming Brainstorming
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.98.0, 0.96.0
    • Fix Version/s: None
    • Component/s: migration
    • Labels:
      None

      Description

      As we know 0.96 is a singularity release, as of today a 0.94 hbase user has to do in-place upgrade: make corresponding client changes, recompile client application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 binary, run upgrade script and then start the upgraded cluster. You can image the down time will be extended if something is wrong in between.

      To minimize the down time, another possible way is to setup a secondary 0.96 cluster and then setup replication between the existing 0.94 cluster and the new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch the traffic to the 0.96 cluster and decommission the old one.

      The ideal steps will be:

      1) Setup a 0.96 cluster
      2) Setup replication between a running 0.94 cluster to the newly created 0.96 cluster
      3) Wait till they're in sync in replication
      4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop relocation now)
      5) Forward read traffic to the slave 0.96 cluster
      6) After a certain period, stop writes to the original 0.94 cluster if everything is good and completes upgrade

      To get us there, there are two tasks:

      1) Enable replication from 0.94 -> 0.96
      I've run the idea with Jean-Daniel Cryans, Devaraj Das and Nick Dimiduk. Currently it seems the best approach is to build a very similar service or on top of https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support three commands replicateLogEntries, multi and delete. Inside the three commands, we just pass down the corresponding requests to the destination 0.96 cluster as a bridge. The reason to support the multi and delete is for CopyTable to copy data from a 0.94 cluster to a 0.96 one.

      The other approach is to provide limited support of 0.94 RPC protocol in 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper firstly before it can connect to a 0.96 region server. Therefore, we need a faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to connect. It may also pollute 0.96 code base with 0.94 RPC code.

      2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need to load both hbase clients into one single JVM using different class loader.

      Let me know if you think this is worth to do and any better approach we could take.

      Thanks!

        Issue Links

          Activity

          Hide
          Andrew Purtell added a comment -

          I would be interested in forward porting something that showed up for 0.96 to 0.98.

          Show
          Andrew Purtell added a comment - I would be interested in forward porting something that showed up for 0.96 to 0.98.
          Hide
          Francis Liu added a comment -

          Sorry late to the party. Stack mentioned this. We have a different approach. We instead extended replication source/sink to use a thrift client/server to ship/receive the edits. We plan on using it for 0.94<>0.96 replication as well as encrypting replication communication. We currently have 0.94<>0.94 next step is 0.94<>0.96. If people are interested I can try and share the code once we have things stable tho 0.94<>0.96 might be a bit later.

          Show
          Francis Liu added a comment - Sorry late to the party. Stack mentioned this. We have a different approach. We instead extended replication source/sink to use a thrift client/server to ship/receive the edits. We plan on using it for 0.94< >0.96 replication as well as encrypting replication communication. We currently have 0.94< >0.94 next step is 0.94< >0.96. If people are interested I can try and share the code once we have things stable tho 0.94< >0.96 might be a bit later.
          Hide
          Jeffrey Zhong added a comment -

          Good suggestions. Let me add a paragraph on this in the ref guide in upgrade & replication.

          Show
          Jeffrey Zhong added a comment - Good suggestions. Let me add a paragraph on this in the ref guide in upgrade & replication.
          Hide
          Nick Dimiduk added a comment -

          A new section in http://hbase.apache.org/book.html#upgrade0.96 plus a note on user@ should do the trick.

          Show
          Nick Dimiduk added a comment - A new section in http://hbase.apache.org/book.html#upgrade0.96 plus a note on user@ should do the trick.
          Hide
          stack added a comment -

          Should at least doc its existence in refguide? Or play a loud trumpet so a bunch get to hear about it (Or post on user list?)

          Show
          stack added a comment - Should at least doc its existence in refguide? Or play a loud trumpet so a bunch get to hear about it (Or post on user list?)
          Hide
          Nick Dimiduk added a comment -

          Should this ticket be resolved then? Is this code we can put in a contrib directory, similar to our dev-support directory, or are we happy with it in an external repo?

          Show
          Nick Dimiduk added a comment - Should this ticket be resolved then? Is this code we can put in a contrib directory, similar to our dev-support directory, or are we happy with it in an external repo?
          Hide
          Jeffrey Zhong added a comment -

          The work is done. The code(https://github.com/hortonworks/HBaseReplicationBridgeServer) can't be checked into HBase code because it will mess 0.96 code base with 0.94 RPC code though I'd love to. If you want to try out replication between 0.94->0.96, you can follow the README @ https://github.com/hortonworks/HBaseReplicationBridgeServer to see if it can work for you.

          Thanks.

          Show
          Jeffrey Zhong added a comment - The work is done. The code( https://github.com/hortonworks/HBaseReplicationBridgeServer ) can't be checked into HBase code because it will mess 0.96 code base with 0.94 RPC code though I'd love to. If you want to try out replication between 0.94->0.96, you can follow the README @ https://github.com/hortonworks/HBaseReplicationBridgeServer to see if it can work for you. Thanks.
          Hide
          Jean-Marc Spaggiari added a comment -

          Hi there,

          any progress on this JIRA? I still have 2 clusters (One 0.94, one 0.96) that I can use to try that...

          JM

          Show
          Jean-Marc Spaggiari added a comment - Hi there, any progress on this JIRA? I still have 2 clusters (One 0.94, one 0.96) that I can use to try that... JM
          Hide
          Jeffrey Zhong added a comment -

          Thanks Jean-Marc Spaggiari! Very nice of you to try it out. For your question:

          if writes are happening in 0.94, how are your perfectly sure of the start time of the replication?

          This is same as normal replication setup. After turning on replication for all tables in source 0.94 cluster, let's say the timestamp is T1. When you export data from source 0.94 cluster, you can export data till T1+15secs(some buffer to overlap time clock drift). It means the replication and import have a small time window overlap which guarantees all the data are copied over. The overlap is safe because replication & import only use puts & deletes, which are idempotent, to copy data.

          Show
          Jeffrey Zhong added a comment - Thanks Jean-Marc Spaggiari ! Very nice of you to try it out. For your question: if writes are happening in 0.94, how are your perfectly sure of the start time of the replication? This is same as normal replication setup. After turning on replication for all tables in source 0.94 cluster, let's say the timestamp is T1. When you export data from source 0.94 cluster, you can export data till T1+15secs(some buffer to overlap time clock drift). It means the replication and import have a small time window overlap which guarantees all the data are copied over. The overlap is safe because replication & import only use puts & deletes, which are idempotent, to copy data.
          Hide
          Jean-Marc Spaggiari added a comment -

          Hi Jeffrey Zhong, I have a 0.94.12+1.0.3 and a 0.96.0+2.2.0 clusters. I will most probably be able to give this a try. We can move the discussion to he mailing list if you want. So far I have done stopped 0.94, a distcp from 0.94 to 0.96, started 0.96, as a test since I'm still using 0.94 (Need to try my MR jobs in 0.96 first). So I can clean 0.96 and give a try to the procedure above.

          Only question is, if writes are happening in 0.94, how are your perfectly sure of the start time of the replication? Might be one ms later of before what you think it is. Is this information stored somewhere?

          Show
          Jean-Marc Spaggiari added a comment - Hi Jeffrey Zhong , I have a 0.94.12+1.0.3 and a 0.96.0+2.2.0 clusters. I will most probably be able to give this a try. We can move the discussion to he mailing list if you want. So far I have done stopped 0.94, a distcp from 0.94 to 0.96, started 0.96, as a test since I'm still using 0.94 (Need to try my MR jobs in 0.96 first). So I can clean 0.96 and give a try to the procedure above. Only question is, if writes are happening in 0.94, how are your perfectly sure of the start time of the replication? Might be one ms later of before what you think it is. Is this information stored somewhere?
          Hide
          Jeffrey Zhong added a comment -

          With hbase-9895 checked in, a user can set up replication from 0.94 to 0.96 hbase cluster as following:

          1) setup HBaseReplicationBridgeServer
          2) setup replication between source hbase0.94 cluster and destination HBaseReplicationBridgeServer instances & mark the time
          3) export data from source 0.94 cluster till the time replication is setup
          4) import exported data into 0.96 cluster from previous step with the replication is on

          Thanks.

          Show
          Jeffrey Zhong added a comment - With hbase-9895 checked in, a user can set up replication from 0.94 to 0.96 hbase cluster as following: 1) setup HBaseReplicationBridgeServer 2) setup replication between source hbase0.94 cluster and destination HBaseReplicationBridgeServer instances & mark the time 3) export data from source 0.94 cluster till the time replication is setup 4) import exported data into 0.96 cluster from previous step with the replication is on Thanks.
          Jeffrey Zhong made changes -
          Field Original Value New Value
          Link This issue relates to HBASE-9895 [ HBASE-9895 ]
          Hide
          Jeffrey Zhong added a comment -

          I did a prototype @https://github.com/hortonworks/HBaseReplicationBridgeServer and tested replication from a 0.94 cluster to a 0.96 cluster.

          The remaining work is to bring the slave cluster to a good base(which we're currently using CopyTable) when setting up replication.

          Support import from a 0.94 export sequence file.

          Currently we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster cannot import 0.94 exported files while we can easily add that support. (personal preferred option)

          Use snapshot without any code changes:

          1) Setup replication without starting replication bridge server so source cluster is queuing WALs
          2) Use Snapshot to bring destination cluster to a good base
          3) Starts replication bridge servers to drain WALs queued up from step 1

          Enable CopyTable against Replication Bridge Server

          This option is least desired one because it will involves significant code changes to have a faked root znode, support root table scan, support meta table scan, delete and multi command in replication bridge server.

          Show
          Jeffrey Zhong added a comment - I did a prototype @ https://github.com/hortonworks/HBaseReplicationBridgeServer and tested replication from a 0.94 cluster to a 0.96 cluster. The remaining work is to bring the slave cluster to a good base(which we're currently using CopyTable) when setting up replication. Support import from a 0.94 export sequence file. Currently we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster cannot import 0.94 exported files while we can easily add that support. (personal preferred option) Use snapshot without any code changes: 1) Setup replication without starting replication bridge server so source cluster is queuing WALs 2) Use Snapshot to bring destination cluster to a good base 3) Starts replication bridge servers to drain WALs queued up from step 1 Enable CopyTable against Replication Bridge Server This option is least desired one because it will involves significant code changes to have a faked root znode, support root table scan, support meta table scan, delete and multi command in replication bridge server.
          Hide
          Jeffrey Zhong added a comment -

          Jean-Daniel Cryans You mentioned another valid scenario to have 0.94->0.96 replication support. Minimizing the upgrade time(or pain) is just one of the driving factors. Thanks.

          Show
          Jeffrey Zhong added a comment - Jean-Daniel Cryans You mentioned another valid scenario to have 0.94->0.96 replication support. Minimizing the upgrade time(or pain) is just one of the driving factors. Thanks.
          Hide
          Jean-Daniel Cryans added a comment -

          When we talked about this, I didn't fully understand that the use case you had in mind was to enable people to upgrade with minimal downtime. I'm personally more interested in supporting simple master-slave replication where the master is the one that stays on 0.94 the longest, which is something people using replication will be facing.

          Right now, upgrading a master-slave DR setup to 0.96 will involve either upgrading everything to 0.96 at the same time or pausing replication while the master isn't upgraded. Himanshu Vashishtha was saying that he tested the latter (although not explicitly) and it works, but still you aren't replicating for X amount of time.

          It would be nice to have a way to keep replicating.

          Show
          Jean-Daniel Cryans added a comment - When we talked about this, I didn't fully understand that the use case you had in mind was to enable people to upgrade with minimal downtime. I'm personally more interested in supporting simple master-slave replication where the master is the one that stays on 0.94 the longest, which is something people using replication will be facing. Right now, upgrading a master-slave DR setup to 0.96 will involve either upgrading everything to 0.96 at the same time or pausing replication while the master isn't upgraded. Himanshu Vashishtha was saying that he tested the latter (although not explicitly) and it works, but still you aren't replicating for X amount of time. It would be nice to have a way to keep replicating.
          Hide
          Jeffrey Zhong added a comment -

          Thanks a lot for the inputs on this. I haven't tried loading both versions of Hbase client in one JVM. Without #2, we still can use replication to minimize the upgrade down time to seconds.

          Has anyone asked for it?
          

          So far no one is asking for this. With 0.94->0.96 replication support, we can ease upgrade pain and encourage people to test 0.96 against their shadow clusters of 0.94 production env.

          Show
          Jeffrey Zhong added a comment - Thanks a lot for the inputs on this. I haven't tried loading both versions of Hbase client in one JVM. Without #2, we still can use replication to minimize the upgrade down time to seconds. Has anyone asked for it? So far no one is asking for this. With 0.94->0.96 replication support, we can ease upgrade pain and encourage people to test 0.96 against their shadow clusters of 0.94 production env.
          Hide
          Lars Hofhansl added a comment -

          #2 does not work, it's something we've been thinking about to do a 0.96 upgrade eventually. So far we've punted this to "some OSGi magic" that will let us load both clients into the same VM.

          Show
          Lars Hofhansl added a comment - #2 does not work, it's something we've been thinking about to do a 0.96 upgrade eventually. So far we've punted this to "some OSGi magic" that will let us load both clients into the same VM.
          Hide
          stack added a comment -

          On 1., a bridge would be cool. Has anyone asked for it? (No to polluting 0.96 w/ 0.94 rpc code – at least at first blush)

          On 2., have you tried it?

          Show
          stack added a comment - On 1., a bridge would be cool. Has anyone asked for it? (No to polluting 0.96 w/ 0.94 rpc code – at least at first blush) On 2., have you tried it?
          Jeffrey Zhong created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Jeffrey Zhong
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:

                Development