Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.0.1
-
None
Description
The following sequence causes a CHECK failure:
- a tablet server receives a CONFIG_CHANGE operation
- the tablet server commits the operation (writing the new consensus config to disk), but crashes before it can write the associated COMMIT message to the log
- the server is down for long enough that it is removed from the configuration again while it's down
- when it comes back up, it sees the CONFIG_CHANGE again as a pending replicate. When it's added to PendingRounds, it is ignored as we can see that this configuration is already committed.
- the tserver gets the request from the master to DeleteTablet because it's no longer part of the configuration
- when trying to abort the operation, it fires a CHECK "Aborting CHANGE_CONFIG_OP but there was no pending config set."