Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-16271

Writes timeout instead of failing on cluster with CL-1 replicas available during replace

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Changes Suggested
    • Normal
    • Resolution: Unresolved
    • None
    • None
    • Correctness - API / Semantic Implementation
    • Normal
    • Normal
    • User Report
    • All
    • None
    • Hide

      Added new unit tests. Run existing dtests.

      Show
      Added new unit tests. Run existing dtests.

    Description

      Writes timeout instead of failing on cluster with CL-1 replicas available during replace node operation.

      With Consistency Level ALL, we are observing Timeout exceptions during writes when (RF - 1) nodes are available in the cluster with one replace-node operation running. The coordinator is expecting RF + 1 responses, while there are only RF nodes (RF-1 nodes in UN and 1 node in UJ) are available in the cluster, hence timing out.

      The same problem happens on a keyspace with RF=1, CL=ONE and one replica being replaced. Also RF=3, CL=QUORUM, one replica down and another being replaced.

      I believe the expected behavior is that the write should fail with UnavailableException since there are not enough NORMAL replicas to fulfill the request.

      Steps to reproduce:

      Run a 3 node test cluster (call the nodes node1 (127.0.0.1), node2 (127.0.0.2), node3 (127.0.0.3)):

       ccm create test -v 3.11.3 -n 3 -s
      

      Create test keyspaces with RF = 3 and RF = 1 respectively:

       create keyspace rf3 with replication = \{'class': 'SimpleStrategy', 'replication_factor': 3};
       create keyspace rf1 with replication = \{'class': 'SimpleStrategy', 'replication_factor': 1};
      

      Create a table test in both the keyspaces:

      create table rf3.test ( pk int primary KEY, value int);
      create table rf1.test ( pk int primary KEY, value int);
      

      Stop node node2:

      ccm node2 stop
      

      Create node node4:

      ccm add node4 -i 127.0.0.4
      

      Enable auto_bootstrap

      ccm node4 updateconf 'auto_bootstrap: true'
      

      Ensure node4 does not have itself in its seeds list.

      Run a replace node to replace node2 (address 127.0.0.2 corresponds to node node2)

      ccm node4 start --jvm_arg="-Dcassandra.replace_address=127.0.0.2"
      

      When the replace node is running, perform write/reads with CONSISTENCY ALL, we observed TimeoutException.

      SET CONSISTENCY ALL:SET CONSISTENCY ALL: 
      cqlsh> insert into rf3.test (pk, value) values (16, 7);       
      
      WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 3 responses." info=\{'received_responses': 3, 'required_responses': 4, 'consistency': 'ALL'}
      cqlsh> CONSISTENCY ONE; 
      cqlsh> insert into rf1.test (pk, value) VALUES(5, 1); 
      
      WriteTimeout: Error from server: code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 1 responses." info=\{'received_responses': 1, 'required_responses': 2, 'consistency': 'ONE'} 
      

      Cluster State:

       Datacenter: datacenter1
      =======================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
      UN  127.0.0.1  70.45 KiB  1            100.0%            4f652b22-045b-493b-8722-fb5f7e1723ce  rack1
      UN  127.0.0.3  70.43 KiB  1            100.0%            a0dcd677-bdb3-4947-b9a7-14f3686a709f  rack1
      UJ  127.0.0.4  137.47 KiB  1            ?                 e3d794f1-081e-4aba-94f2-31950c713846  rack1
      

      Note:
      We introduced sleep during replace operation in order to simulate do our experiments. We attached code diff that does it.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            samt Sam Tunnicliffe Assign to me
            tejavadali Krishna Vadali
            Sam Tunnicliffe
            Krishna Vadali, Paulo Motta

            Dates

              Created:
              Updated:

              Slack

                Issue deployment