-
Type:
Bug
-
Status: Resolved
-
Priority:
Low
-
Resolution: Fixed
-
Fix Version/s: 3.0.16, 3.11.2, 4.0, 4.0-alpha1
-
Component/s: Legacy/Distributed Metadata, Legacy/Streaming and Messaging
-
Labels:None
-
Severity:Low
-
Since Version:
After upgrading from Cassandra from 2.0.12 to 2.0.15, whenever we killed a cassandra process (with SIGTERM), some other nodes maintained a connection with the killed node in the CLOSE_WAIT state on port 7000 for about 5-20 minutes.
So, when we started the killed node again, other nodes could not establish a handshake because of the connections on the CLOSE_WAIT state, so they remained on the DOWN state to each other until the initial connection expired.
The problem did not happen if I ran a nodetool disablegossip before killing the node.
I was able to fix this issue by reverting the CASSANDRA-8336 commits (including CASSANDRA-9238). After reverting this, cassandra now closes connection correctly when killed with -TERM, but leaves connections on CLOSE_WAIT state if I run nodetool disablethrift before killing the nodes.
I did not try to reproduce the problem in a clean environment.
- relates to
-
CASSANDRA-8336 Add shutdown gossip state to prevent timeouts during rolling restarts
-
- Resolved
-