Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.4.9, 1.4.10
-
None
-
Reviewed
Description
When a cluster is passive (receiving edits only via replication) in a cyclic replication setup of 2 clusters, OldWALs size keeps on growing. On analysing, we observed the following behaviour.
- New entry is added to WAL (Edit replicated from other cluster).
- ReplicationSourceWALReaderThread(RSWALRT) reads and applies the configured filters (due to cyclic replication setup, ClusterMarkingEntryFilter discards new entry from other cluster).
- Entry is null, RSWALRT neither updates the batch stats (WALEntryBatch.lastWalPosition) nor puts it in the entryBatchQueue.
- ReplicationSource thread is blocked in entryBachQueue.take().
- So ReplicationSource#updateLogPosition has never invoked and WAL file is never cleared from ReplicationQueue.
- Hence LogCleaner on the master, doesn't deletes the oldWAL files from hadoop.
NOTE: When a new edit is added via hbase-client, ReplicationSource thread process and clears the oldWAL files from replication queues and hence master cleans up the WALs
Please provide us a solution
Attachments
Attachments
Issue Links
- is related to
-
HBASE-23169 Random region server aborts while clearing Old Wals
- Open
- relates to
-
HBASE-23205 Correctly update the position of WALs currently being replicated.
- Resolved