While we have plugged most of these holes, there appears to be another that is fairly rare.
I've seen it play out a couple ways in tests, but it looks like part of the problem is that even if we decide we need a file and download it, we don't care if we then cannot move it into place if it already exists.
I'm working with a fix that does two things:
- Fail a replication attempt if we cannot move a file into place because it already exists.
- If a replication attempt during recovery fails, on the next attempt force a full replication to a new directory.