[HBASE-25295] Refactor the locate WAL logic in ReplicationSource - ASF JIRA

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Replication
Labels:
None

Description

When cluster replication enabled and one RegionServer crashed, its WALs will be move from WALs dir to oldWALs dir and its replication queue will moved to other RegionServer's replication queue.

HDFS layout (WAL Storage)
/hbase/WALs/RS1/1.log
/hbase/WALs/RS1/2.log
/hbase/WALs/RS1/3.log
ZooKeeper layout (Replication queue storage)
/hbase/replication/rs/RS1/peerId/1.log
/hbase/replication/rs/RS1/peerId/2.log
/hbase/replication/rs/RS1/peerId/3.log

Failover finished:
HDFS layout (WAL Storage)
/hbase/.oldWALs/1.log
/hbase/.oldWALs/2.log
/hbase/.oldWALs/3.log
ZooKeeper layout (Replication queue storage)
/hbase/replication/rs/RS2/peerId-RS1/1.log
/hbase/replication/rs/RS2/peerId-RS1/2.log
/hbase/replication/rs/RS2/peerId-RS1/3.log

And if enabled hbase.separate.oldlogdir.by.regionserver, the HDFS layout may be:
HDFS layout (WAL Storage)
/hbase/.oldWALs/RS1/1.log
/hbase/.oldWALs/RS1/2.log
/hbase/.oldWALs/RS1/3.log

Then if RS2 crashed, the HDFS layout will not change but ZooKeeper layout may changed.
ZooKeeper layout (Replication queue storage)
/hbase/replication/rs/RS3/peerId-RS1-RS2/1.log
/hbase/replication/rs/RS3/peerId-RS1-RS2/2.log
/hbase/replication/rs/RS3/peerId-RS1-RS2/3.log

So even the replication queue was transfered many times, the HDFS layout never change.

Another case is master-cluster disaster, the failover work not finished. Then ReplicationSyncUp tool can start replication source to replicate the WAL data. The HDFS layout need to consider two more case:
/hbase/WALs/RS1/1.log
/hbase/WALs/RS1/2.log
/hbase/WALs/RS1/3.log
or
/hbase/WALs/RS1-splitting/1.log
/hbase/WALs/RS1-splitting/2.log
/hbase/WALs/RS1-splitting/3.log