Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-19364

Data loss during decommission possible due to a delayed and unsynced pending ranges calculation

    XMLWordPrintableJSON

Details

    • All
    • None

    Description

      This possible issue has been discovered while inspecting flaky tests of CASSANDRA-18824. Pending ranges calculation is executed asynchronously when the node is decommissioned. If the data is inserted during decommissioning, and pending ranges calculation is delayed for some reason (it can be as it is not synchronous), we may end up with partial data loss. That can be just a wrong test. Thus, I perceive this ticket more like a memo for further investigation or discussion. 

      Note that this has obviously been fixed by TCM.

      The test in question was:

              try (Cluster cluster = init(builder().withNodes(2)
                                                   .withTokenSupplier(evenlyDistributedTokens(2))
                                                   .withNodeIdTopology(NetworkTopology.singleDcNetworkTopology(2, "dc0", "rack0"))
                                                   .withConfig(config -> config.with(NETWORK, GOSSIP))
                                                   .start(), 1))
              {
                  IInvokableInstance nodeToDecommission = cluster.get(1);
                  IInvokableInstance nodeToRemainInCluster = cluster.get(2);
      
                  // Start decomission on nodeToDecommission
                  cluster.forEach(statusToDecommission(nodeToDecommission));
                  logger.info("Decommissioning node {}", nodeToDecommission.broadcastAddress());
      
                  // Add data to cluster while node is decomissioning
                  int numRows = 100;
                  cluster.schemaChange("CREATE TABLE IF NOT EXISTS " + KEYSPACE + ".tbl (pk int, ck int, v int, PRIMARY KEY (pk, ck))");
                  insertData(cluster, 1, numRows, ConsistencyLevel.ONE); // <------------------- HERE - when PRC is delayed, we get there only ~50% of inserted rows
      
                  // Check data before cleanup on nodeToRemainInCluster
                  assertEquals(100, nodeToRemainInCluster.executeInternal("SELECT * FROM " + KEYSPACE + ".tbl").length);
          }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jlewandowski Jacek Lewandowski
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: