Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
Reported by benjaminhindman. It's fairly non-intuitive that a 'fill' retry is required when recovering an empty log. Moreover, since retry is done via a 'delay' it means that you can't pause the clock before calling Log::Writer::start! The following tests show the multiple calls and at one point I added comments to explain the very esoteric reasoning here. Here are the sequence of events:
First a replica is recovered with nothing but 0 is always a hole:
Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
At this point the replica assumes it was promised to a coordinator with proposal 0 (that's the default metadata). Then an implicit promise request is made with proposal 1.
Replica received implicit promise request with proposal 1
Then the coordinator (via FillProcess) tries to fill the hole (position 0) explicitly:
Coordinator attemping to fill missing position
And the replica receives the request:
Replica received explicit promise request for position 0 with proposal 1
But the filling must be retried because the 0th position is implicitly promised to proposer 1 (the same coordinator!) but the replica won't allow it (because it might not be safe) so the FillProcess now tries with proposal number 2 (after the delay). While correct, this seems unfortunate (and not intuitive).