Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.0.2
-
None
-
Regular Contributors Level (Easy to Medium)
Description
I've thought about this long and hard and probably should have submitted the bug a long time ago. I've also run this in production for months.
When a checkpoint conflict occurs it is almost always the right thing to do to abort.
If there is a rev mismatch it could mean there's are two conflicting (continuous and one-shot) replications between the same hosts running. Without reloading the history documents checkpoints will continue to fail forever. This could leave us in a state with many replicated changes but no checkpoints.
Similarly, a successful checkpoint but a lost/timed-out response could cause this situation.
Since the supervisor will restart the replication anyway, I think it's safer to abort and retry.