Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13700

Heartbeats can cause gossip information to go permanently missing on certain nodes



    • Availability - Unavailable
    • Critical


      In Gossiper.getStateForVersionBiggerThan, we add the HeartBeatState from the corresponding EndpointState to the EndpointState to send. When we're getting state for ourselves, this means that we add a reference to the local HeartBeatState. Then, once we've built a message (in either the Syn or Ack handler), we send it through the MessagingService. In the case that the MessagingService is sufficiently slow, the GossipTask may run before serialization of the Syn or Ack. This means that when the GossipTask acquires the gossip taskLock, it may increment the HeartBeatState version of the local node as stored in the endpoint state map. Then, when we finally serialize the Syn or Ack, we'll follow the reference to the HeartBeatState and serialize it with a higher version than we saw when constructing the Ack or Ack2.

      Consider the case where we see HeartBeatState with version 4 when constructing an Ack and send it through the MessagingService. Then, we add some piece of state with version 5 to our local EndpointState. If GossipTask runs and increases the HeartBeatState version to 6 before the MessageOut containing the Ack is serialized, the node receiving the Ack will believe it is current to version 6, despite the fact that it has never received a message containing the ApplicationState tagged with version 5.

      I've reproduced in this in several versions; so far, I believe this is possible in all versions.




            jkni Joel Knighton
            jkni Joel Knighton
            Joel Knighton
            Jason Brown
            1 Vote for this issue
            11 Start watching this issue