Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-16545

Cluster topology change may produce false unavailable for queries

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      When the coordinator processes a query, it first gets the ReplicationStrategy (RS) from the keyspace to decide the peers to contact. Again, it gets the RS to perform the liveness check for the requested CL.

      The RS is a volatile filed in Keyspace, and it is possible that those 2 getter calls return different RS values in the presence of cluster topology changes, e.g. add a node, etc.

      In such scenario, the check at the second step can throw an unexpected unavailable. From the perspective of the query, the cluster can satisfy the CL.

      We should use a consistent view of RS during the peer selection and CL liveness check. In other word, both steps should reference to the same RS object. It is also more clear and easier to reason about to the clients. Such queries are made before the topology change.

        Attachments

          Activity

            People

            • Assignee:
              yifanc Yifan Cai Assign to me
              Reporter:
              yifanc Yifan Cai
              Authors:
              Yifan Cai
              Reviewers:
              Aleksey Yeschenko, Andres de la Peña

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h

                  Issue deployment