Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27109

Move replication queue storage from zookeeper to a separated HBase table

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-4
    • Replication
    • None
    • Reviewed
    • Hide
      We introduced a table based replication queue storage in this issue. The queue data will be stored in hbase:replication table. This is the last piece of persistent data on zookeeper. So after this change, we are OK to clean up all the data on zookeeper, as now they are all transient, a cluster restarting can fix everything.

      The data structure has been changed a bit as now we only support an offset for a WAL group instead of storing all the WAL files for a WAL group. Please see the replication internals section in our ref guide for more details.

      To break the cyclic dependency issue, i.e, creating a new WAL writer requires writing to replication queue storage first but with table based replication queue storage, you first need a WAL writer when you want to update to table, now we will not record a queue when creating a new WAL writer instance. The downside for this change is that, the logic for claiming queue and WAL cleaner are much more complicated. See AssignReplicationQueuesProcedure and ReplicationLogCleaner for more details if you have interest.

      Notice that, we will use a separate WAL provider for hbase:replication table, so you will see a new WAL file for the region server which holds the hbase:replication table. If we do not do this, the update to hbase:replication table will also generate some WAL edits in the WAL file we need to track in replication, and then lead to more updates to hbase:replication table since we have advanced the replication offset. In this way we will generate a lot of garbage in our WAL file, even if we write nothing to the cluster. So a separated WAL provider which is not tracked by replication is necessary here.

      The data migration will be done automatically during rolling upgrading, of course the migration via a full cluster restart is also supported, but please make sure you restart master with new code first. The replication peers will be disabled during the migration and no claiming queue will be scheduled at the same time. So you may see a lot of unfinished SCPs during the migration but do not worry, it will not block the normal failover, all regions will be assigned. The replication peers will be enabled again after the migration is done, no manual operations needed.

      The ReplicationSyncUp tool is also affected. The goal of this tool is to replicate data to peer cluster while the source cluster is down. But if we store the replication queue data in a hbase table, it is impossible for us to get the newest data if the source cluster is down. So here we choose to read from the region directory directly to load all the replication queue data in memory, and do the sync up work. We may lose the newest data so in this way we need to replicate more data but it will not affect correctness.
      Show
      We introduced a table based replication queue storage in this issue. The queue data will be stored in hbase:replication table. This is the last piece of persistent data on zookeeper. So after this change, we are OK to clean up all the data on zookeeper, as now they are all transient, a cluster restarting can fix everything. The data structure has been changed a bit as now we only support an offset for a WAL group instead of storing all the WAL files for a WAL group. Please see the replication internals section in our ref guide for more details. To break the cyclic dependency issue, i.e, creating a new WAL writer requires writing to replication queue storage first but with table based replication queue storage, you first need a WAL writer when you want to update to table, now we will not record a queue when creating a new WAL writer instance. The downside for this change is that, the logic for claiming queue and WAL cleaner are much more complicated. See AssignReplicationQueuesProcedure and ReplicationLogCleaner for more details if you have interest. Notice that, we will use a separate WAL provider for hbase:replication table, so you will see a new WAL file for the region server which holds the hbase:replication table. If we do not do this, the update to hbase:replication table will also generate some WAL edits in the WAL file we need to track in replication, and then lead to more updates to hbase:replication table since we have advanced the replication offset. In this way we will generate a lot of garbage in our WAL file, even if we write nothing to the cluster. So a separated WAL provider which is not tracked by replication is necessary here. The data migration will be done automatically during rolling upgrading, of course the migration via a full cluster restart is also supported, but please make sure you restart master with new code first. The replication peers will be disabled during the migration and no claiming queue will be scheduled at the same time. So you may see a lot of unfinished SCPs during the migration but do not worry, it will not block the normal failover, all regions will be assigned. The replication peers will be enabled again after the migration is done, no manual operations needed. The ReplicationSyncUp tool is also affected. The goal of this tool is to replicate data to peer cluster while the source cluster is down. But if we store the replication queue data in a hbase table, it is impossible for us to get the newest data if the source cluster is down. So here we choose to read from the region directory directly to load all the replication queue data in memory, and do the sync up work. We may lose the newest data so in this way we need to replicate more data but it will not affect correctness.

    Description

      This is a more specific issue based on the works which are already done in HBASE-15867.

      Attachments

        Issue Links

          Activity

            People

              zhangduo Duo Zhang
              zhangduo Duo Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: