Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Public beta
-
None
-
None
Description
I am hitting an issue where tablets with transactions in REPLICATING state, where the leader cannot finish replicating due to cluster availability or leader outage, gets stuck in a tablet flush call at shutdown time. This is because when the maintenance manager shuts down it waits for outstanding ops to complete, and if one of those ops is a flush that never completes, it hangs.
We think the solution is to shut down consensus before shutting down the MM and having consensus abort the in-flights back to committed at shutdown time.