Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Normal
-
Normal
Description
On a 3-node cluster with RF=2, decommissioning a node may cause quorum write timeout because messaging backlog to decommissioned node is cleared via Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket().
(Timeout is less likely to happen with RF=3, because we can afford one less response)
What happened:
1. [WriteStage] before the leaving node is removed from tokenmetadata, the write endpoints are generated ( leaving endpoint is included )
2. [GossipStage] the leaving node is removed from tokenmetadata, no more future write handler will include leaving endpoints
3. [WriteStage] write handlers sends messages to messaging-service backlog
4. [GossipStage] messaging-service backlog is cleared, messages are not sent and connection closed
5. [WriteStage] write time out
We can avoid it by delaying to destroy messaging connection so that messages are sent and responded. This patch also avoids reopening already closed connection on MessagingService#convict().
New messaging framework rewrite in Trunk avoids the issues by not clearing messaging backlog.