Details
-
Bug
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Correctness - Transient Incorrect Response
-
Normal
-
Normal
-
User Report
-
All
-
None
Description
We have recently encountered a recurring old IP reappearance issue while testing decommissions on some of our Kubernetes Cassandra staging clusters.
Issue Description
In Kubernetes, a Cassandra node can change IP at each pod bounce. We have noticed that this behavior, associated with a decommission operation, can get the cluster into an erroneous state.
Consider the following situation: a Cassandra node node1 , with hostId1, owning 20.5% of the token ring, bounces and switches IP (old_IP → new_IP). After a couple gossip iterations, all other nodes’ nodetool status output includes a new_IP UN entry owning 20.5% of the token ring and no old_IP entry.
Shortly after the bounce, node1 gets decommissioned. Our cluster does not have a lot of data, and the decommission operation completes pretty quickly. Logs on other nodes start showing acknowledgment that node1 has left and soon, nodetool status’ new_IP UL entry disappears. node1 ‘s pod is deleted.
After a minute delay, the cluster enters the erroneous state. An old_IP DN entry reappears in nodetool status, owning 20.5% of the token ring. No node owns this IP anymore and according to logs, old_IP is still associated with hostId1.
Issue Root Cause
By digging through Cassandra logs, and re-testing this scenario over and over again, we have reached the following conclusion:
- Other nodes will continue exchanging gossip about old_IP , even after it becomes a fatClient.
- The fatClient timeout and subsequent quarantine does not stop old_IP from reappearing in a node’s Gossip state, once its quarantine is over. We believe that this is due to a misalignment on all nodes’ old_IP expiration time.
- Once new_IP has left the cluster, and old_IP next gossip state message is received by a node, StorageService will no longer face collisions (or will, but with an even older IP) for hostId1 and its corresponding tokens. As a result, old_IP will regain ownership of 20.5% of the token ring.
Proposed fix
Following the above investigation, we were thinking about implementing the following fix:
When a node receives a gossip status change with STATE_LEFT for a leaving endpoint new_IP, before evicting new_IP from the token ring, purge from Gossip (ie evictFromMembership) all endpoints that meet the following criteria:
- endpointStateMap contains this endpoint
- The endpoint is not currently a token owner (!tokenMetadata.isMember(endpoint))
- The endpoint’s hostId matches the hostId of new_IP
- The endpoint is older than leaving_IP (Gossiper.instance.compareEndpointStartup)
- The endpoint’s token range (from endpointStateMap) intersects with new_IP’s
This modification’s intention is to force nodes to realign on old_IP expiration, and expunge it from Gossip so it does not reappear after new_IP leaves the ring.
Another approach we have also been considering is expunging old_IP at the moment of the StorageService collision resolution.
Attachments
Attachments
Issue Links
- relates to
-
CASSANDRA-8260 Replacing a node can leave the old node in system.peers on the replacement
-
- Resolved
-
-
CASSANDRA-8304 Explore evicting replacement state sooner
-
- Open
-