Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
Private Beta
-
None
-
None
Description
ClientTest.TestScanFaultTolerance is producing this deadlock scenario pretty reliably:
- TS A is leader and replicates a CONFIG_CHANGE to a majority
- TS A shuts down before it sends out the updated commit index
- TS B is elected leader and replicates CONFIG_CHANGE to a majority, and shuts down before sending commit index
- TS C now has two CONFIG_CHANGE ops replicated but neither committed. The second one is blocking the prepare queue trying to acquire the config_sem. The first one is not getting committed because the other servers are down
TS C now can't shut down.
I think the same could happen even with just two nodes – if a node is elected leader when it has an uncommitted CONFIG_CHANGE prepared, it won't be able to ever commit its own CONFIG_CHANGE because it will get blocked in PREPARE
Attachments
Issue Links
- is duplicated by
-
KUDU-768 Possible self-deadlock in RaftConsensus::Shutdown()
- Resolved