Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
Correctness - Recoverable Corruption / Loss
-
Normal
-
Normal
-
User Report
-
All
-
None
-
Description
In CASSANDRA-8838, pauloricardomg asked "hints will not be stored to the bootstrapping node after RING_DELAY, since it will evicted from the TMD pending ranges. Should we create a ticket to address this?"
CASSANDRA-15264 relates to the most likely cause of such situations, where the Cassandra daemon on the bootstrapping node completely crashes. Based on testing with kill -STOP on a bootstrapping Cassandra JVM, I believe it also is possible to remove token metadata (and thus pending ranges, and thus hints) for a bootstrapping node, simply by affecting its status in the failure detector.
A node in the cluster sees the bootstrapping node this way:
INFO [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java:1111 - Node /PUBLIC-IP is now part of the cluster INFO [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - InetAddress /PUBLIC-IP is now UP INFO [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 ID#0] Creating new streaming plan for Bootstrap INFO [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, ID#0] Received streaming plan for Bootstrap INFO [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, ID#0] Received streaming plan for Bootstrap INFO [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 files(139744616815 bytes) INFO [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - InetAddress /PUBLIC-IP is now DOWN INFO [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient /PUBLIC-IP has been silent for 30000ms, removing from gossip
Since the bootstrapping node has no tokens, it is treated like a fat client, and it is removed from the ring. For correctness purposes, I believe we must keep storing hints for the downed bootstrapping node until it is either assassinated or until a replacement attempts to bootstrap for the same token.
Attachments
Issue Links
- links to