Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13115

Read repair is not blocking repair to finish in foreground repair

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: 3.0.11, 3.10
    • Component/s: None
    • Labels:
      None
    • Environment:

      ccm on OSX

    • Severity:
      Normal

      Description

      The code trying to wait(block) for repair result to come back in 3.X is below:

      DataResolver.java
      public void close()
              {
                  try
                  {
                      FBUtilities.waitOnFutures(repairResults, DatabaseDescriptor.getWriteRpcTimeout());
                  }
                  catch (TimeoutException ex)
                  {
                      // We got all responses, but timed out while repairing
                      int blockFor = consistency.blockFor(keyspace);
                      if (Tracing.isTracing())
                          Tracing.trace("Timed out while read-repairing after receiving all {} data and digest responses", blockFor);
                      else
                          logger.debug("Timeout while read-repairing after receiving all {} data and digest responses", blockFor);
      
                      throw new ReadTimeoutException(consistency, blockFor-1, blockFor, true);
                  }
              }
      

      in DataResolver class, but this close method is never called and it's also not auto close(RepairMergeListener is not extending from AutoCloseable/CloseableIterator) which means we never wait for repair to finish before returning final result.

      The steps to reproduce:
      1. create some keyspace/table with RF = 2
      2. start 2 nodes using ccm
      3. stop node2
      4. disable node1 hinted hand off
      5. write some data to node1 with consistency level one
      6. start node2
      7. query some data from node1
      This should trigger read repair. I put some log in above close method, and can not see log print put.

      So this bug will basically violate "monotonic quorum reads " guarantee.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              slebresne Sylvain Lebresne Assign to me
              Reporter:
              xiaolong302@gmail.com Xiaolong Jiang
              Authors:
              Sylvain Lebresne
              Reviewers:
              Jason Brown

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment