Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-15439

Token metadata for bootstrapping nodes is lost under temporary failures

    XMLWordPrintableJSON

Details

    Description

      In CASSANDRA-8838, pauloricardomg asked "hints will not be stored to the bootstrapping node after RING_DELAY, since it will evicted from the TMD pending ranges. Should we create a ticket to address this?"

      CASSANDRA-15264 relates to the most likely cause of such situations, where the Cassandra daemon on the bootstrapping node completely crashes. Based on testing with kill -STOP on a bootstrapping Cassandra JVM, I believe it also is possible to remove token metadata (and thus pending ranges, and thus hints) for a bootstrapping node, simply by affecting its status in the failure detector.

      A node in the cluster sees the bootstrapping node this way:

      INFO  [GossipStage:1] 2019-11-27 20:41:41,101 Gossiper.java:1111 - Node /PUBLIC-IP is now part of the cluster
      INFO  [GossipStage:1] 2019-11-27 20:41:41,199 Gossiper.java:1073 - InetAddress /PUBLIC-IP is now UP
      INFO  [HANDSHAKE-/PRIVATE-IP] 2019-11-27 20:41:41,412 OutboundTcpConnection.java:565 - Handshaking version with /PRIVATE-IP
      INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,019 StreamResultFuture.java:112 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 ID#0] Creating new streaming plan for Bootstrap
      INFO  [STREAM-INIT-/PRIVATE-IP:21233] 2019-11-27 20:42:10,020 StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, ID#0] Received streaming plan for Bootstrap
      INFO  [STREAM-INIT-/PRIVATE-IP:56003] 2019-11-27 20:42:10,112 StreamResultFuture.java:119 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4, ID#0] Received streaming plan for Bootstrap
      INFO  [STREAM-IN-/PUBLIC-IP] 2019-11-27 20:42:10,179 StreamResultFuture.java:169 - [Stream #6219a950-1156-11ea-b45d-4d30364576c4 ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 833 files(139744616815 bytes)
      INFO  [GossipStage:1] 2019-11-27 20:54:47,547 Gossiper.java:1089 - InetAddress /PUBLIC-IP is now DOWN
      INFO  [GossipTasks:1] 2019-11-27 20:54:57,551 Gossiper.java:849 - FatClient /PUBLIC-IP has been silent for 30000ms, removing from gossip
      

      Since the bootstrapping node has no tokens, it is treated like a fat client, and it is removed from the ring. For correctness purposes, I believe we must keep storing hints for the downed bootstrapping node until it is either assassinated or until a replacement attempts to bootstrap for the same token.

      Attachments

        Issue Links

          Activity

            People

              rmhuffman Raymond Huffman
              josnyder Josh Snyder
              Raymond Huffman
              Brandon Williams, David Capwell
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m