We observed a lot of old WALs were not removed from archives and their corresponding replication queues, while testing with 1.4.10.
stacked old WALs are empty or have no entries to be replicated (not in replication table_cfs)
As described in
HBASE-22784, if no entries to be replicated are appended to WALs, log position will never be updated. As a consequence, all WALs won’t be removed. this issue happened since HBASE-15995.
I think old WALs would not be stacked with
HBASE-22784. but, it still have something to be fixed as below
case 1) Log position could be updated wrongly, when log rolled, because lastWalPath of batches might not point to WAL currently being read.
- For example, after last entry added in a batch were read from P1 position in the WAL W1
and then WAL rolled, and reader read until it reaches the end of old wals and continue reading entries from new WAL W2, and then it reached batch size. current read position for W2 is P2. In this case, the batch being passed to a shipper have walPath W1 and P2, so shipper will try to update position P2 for W1. it may result in data inconsistency in recovery case or update failure to zookeeper (znode could not exist by previous log position updates, i guess this case is the same case as HBASE-23169 ?)
case 2) Log position could be not updated or updated to wrong position by pendingShipment flag introduced from
- In shipper thread, it would not be guaranteed to update log position always, by setting pendingShipment to false.
If reader set the flag to true, right after shipper set it to false during updateLogPosition(), shipper won’t update log position.
On the other hand, while reader read filtered entries, If shipper set to false reader will update log position to current read position. it may lose data in recovery case.
case 3) A lot of log position updates could be happened, when most of WAL entries are filtered by TableCfWALEntryFilter.
- I think it would be better to reduce the number of log updates in that case, because
- zookeeper writes are more expensive operations than reads.(since writes involve synchronizing the state of all servers),
- even if read position was not updated, it would be harmless because all entries will be filtered out again in recovery process.
- It would be enough to update log position only when wal rolled in that case. (to cleanup old wals)
In addition, During this work, i found a minor bug which is updating replication buffer size wrongly by decreasing total buffer size with the size of bulk loaded files.
I’d like to fix it, if it’s ok.
I removed the changes above and made a separate jira :