As part of the fix for
ZOOKEEPER-1797, the call to FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java. As a result, some old-looking but required txn log files can be deleted, resulting in data corruption or loss.
For example, consider the following:
2. Following files exist:
log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
snapshot.110 - snapshot as of zxid=110
snapshot.120 - snapshot as of zxid=120
snapshot.130 - snapshot as of zxid=130
Above scenario is possible when snapshotting has happened multiple times but without accompanying log rollover, which is possible if the server was running as a learner.
3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is older than the zxid of the oldest snapshot (110). This results in loss of transactions in the range 131-140.
Before the fix for
ZOOKEEPER-1797, this was avoided by the call to FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log file with starting zxid < oldest retained snapshot's highest zxid.