Thanks. But will there be this case?
After a snapshot recovery of core A is done, the tlog is still out-of-date without any new records from recovery, and it's not cleared. And if the just recovered core(core A) taking the leader role, and another core(core C) is trying to recover from it. As A's tlog contains the old entries without newest ones, will the core C do a peersync only with the old records, but missing the newest ones?
And I think the snapshot recovery is because there are too much difference between the 2 cores, so the tlog gap are also too much. So the out-of-date tlog is no longer needed for peersync.
Our testing shows the snapshot recovery does not clean tlog with below steps:
1, Core A and core B are 2 replicas of a shard.
2, Core A down, and core B took leader role. And it takes some updates and record them to its tlog.
3, After A up, it will do recovery from B, and if the difference are too much, A will do snapshot pull recovery. And during the snapshot pull recovery, there is no other update comes in. After the snapshot pull recovery, the tlog of A is not updated, it still does NOT contain any most recent from B.
And the tlog are still out-of-date, although the index of A is already updated.
4, Core A down again, and core B still remain the leader role, and it takes some other updates and recore them to its tlog.
5, After A up again, it will do recovery from B. But it found its tlog is still too old. So it will do a snapshot recovery again, which is not necessary.
Do you agree? Thanks!