Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Abandoned
-
3.0.0-alpha-1, 2.3.0, 1.6.0, hbase-operator-tools-1.1.0
-
None
-
None
Description
The replication source will discard the WAL file in many cases when it encounters an exception reading it . This can cause data loss
and the underlying reason of failed read remains hidden. Only in certain scenarios, the replication source should dump the current WAL and move to the next one.
This JIRA aims to have an hbck option to check the WAL files of replication queues for any inconsistencies and also provide an option to fix it.
The fix can be to remove the file from replication queue in zk and from the memory of replication source manager and replication sources.
A region server endpoint call from the hbck client to region server can be used to achieve this.
Hbck can be configured with the following options:
-softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL currently read by replication source) from replication queue. If there is a position associated, it also seeks to that position and reads an entry from there
-hardCheckReplicationWAL: Check all WAL paths from replication queues by reading them completely to make sure they are ok.
-fixMissingReplicationWAL: Remove the WAL's from replication queues which are not present on hdfs
-fixCorruptedReplicationWAL: Remove the WAL's from replication queues which are corrupted (based on the findings from softCheck/hardCheck). Also the WAL's are moved to a quarantine dir
-rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is first rolled over and then deals with it in the same way as -fixCorruptedReplicationWAL option
Attachments
Attachments
Issue Links
- is related to
-
HBASE-18137 Replication gets stuck for empty WALs
- Resolved
- relates to
-
HBASE-12126 Region server coprocessor endpoint
- Closed
- links to