Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25295

Refactor the locate WAL logic in ReplicationSource

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Replication
    • None

    Description

      When cluster replication enabled and one RegionServer crashed, its WALs will be move from WALs dir to oldWALs dir and its replication queue will moved to other RegionServer's replication queue.
       
      HDFS layout (WAL Storage)
      /hbase/WALs/RS1/1.log
      /hbase/WALs/RS1/2.log
      /hbase/WALs/RS1/3.log
      ZooKeeper layout (Replication queue storage)
      /hbase/replication/rs/RS1/peerId/1.log
      /hbase/replication/rs/RS1/peerId/2.log
      /hbase/replication/rs/RS1/peerId/3.log
       
      Failover finished:
      HDFS layout (WAL Storage)
      /hbase/.oldWALs/1.log
      /hbase/.oldWALs/2.log
      /hbase/.oldWALs/3.log
      ZooKeeper layout (Replication queue storage)
      /hbase/replication/rs/RS2/peerId-RS1/1.log
      /hbase/replication/rs/RS2/peerId-RS1/2.log
      /hbase/replication/rs/RS2/peerId-RS1/3.log
       
      And if enabled hbase.separate.oldlogdir.by.regionserver, the HDFS layout may be:
      HDFS layout (WAL Storage)
      /hbase/.oldWALs/RS1/1.log
      /hbase/.oldWALs/RS1/2.log
      /hbase/.oldWALs/RS1/3.log
       
      Then if RS2 crashed, the HDFS layout will not change but ZooKeeper layout may changed.
      ZooKeeper layout (Replication queue storage)
      /hbase/replication/rs/RS3/peerId-RS1-RS2/1.log
      /hbase/replication/rs/RS3/peerId-RS1-RS2/2.log
      /hbase/replication/rs/RS3/peerId-RS1-RS2/3.log
       
      So even the replication queue was transfered many times, the HDFS layout never change.
       
      Another case is master-cluster disaster, the failover work not finished. Then ReplicationSyncUp tool can start replication source to replicate the WAL data. The HDFS layout need to consider two more case:
      /hbase/WALs/RS1/1.log
      /hbase/WALs/RS1/2.log
      /hbase/WALs/RS1/3.log
      or
      /hbase/WALs/RS1-splitting/1.log
      /hbase/WALs/RS1-splitting/2.log
      /hbase/WALs/RS1-splitting/3.log

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yuqi yuqi
            zghao Guanghao Zhang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment