Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14768

Transient Replication: Consistency Level Semantics

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      For a keyspace without transient replication, we will always attempt (and write hints for) all logical endpoints, including those that seem to be alive but are not responding (or perhaps dropping some messages). With transient replication, in this scenario we only write to the transient replicas if a certain period of time elapses and we have not met our consistency level.

      This doesn’t lead to the same logical behaviour, although technically the guarantees are the same. In the past, you could expect that all DCs would reach their own local quorum promptly, if say only a single node is failing. Now, you could reach QUORUM with only one DC + 1 remote node, and the remote DC will stay out of whack until repair runs. This is even worse for e.g. LOCAL_{QUORUM,ONE}.

      While the guarantees of the system are the same, the actual behaviour is suboptimal - while the coordinator and remote DCs are healthy, in my opinion we should do our best to ensure each DC reaches its own quorum, just as a normal write would.

      This probably entails having our write callback handle failure to not only write a hint for the endpoint, but also decide if a mutation should immediately be sent to a corresponding transient replica.

      At the very least, we should discuss this before 4.0, even if we opt to take no action before 4.x.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            benedict Benedict Elliott Smith

            Dates

              Created:
              Updated:

              Slack

                Issue deployment